Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lot3 work #958

Merged
merged 86 commits into from
May 2, 2024
Merged

Lot3 work #958

merged 86 commits into from
May 2, 2024

Conversation

sambles
Copy link
Contributor

@sambles sambles commented Jan 30, 2024

**IMPORTANT: Please attach or create an issue after submitting a Pull Request.

Lot3 work

  • Added SQL endpoints for exposure data queries (Currently disabled)
  • Replaced Remote file storage manager with OasisDataManager package
  • Added the option to read model_data from a remote object store (S3, Azure BlobStorage)

There are two sets of 'remote storage' configuration options.

  • General Storage, for files like results and exposure in an ObjectStore - Env var prefix = OASIS_{option}
  • Model Data Storage, for reading 'model_data' during an analysis execution - Env var prefix = OASIS_WORKER_MODEL_DATA_{option}

For Valid options see src/common/filestore/filestore.py

root_dir=settings.get(section, 'ROOT_DIR', fallback=""),
bucket_name=settings.get(section, 'AWS_BUCKET_NAME'),
access_key=settings.get(section, 'AWS_ACCESS_KEY_ID', fallback=None),
secret_key=settings.get(section, 'AWS_SECRET_ACCESS_KEY', fallback=None),
endpoint_url=settings.get(section, 'AWS_S3_ENDPOINT_URL', fallback=None),
file_overwrite=settings.getboolean(section, 'AWS_S3_FILE_OVERWRITE', fallback=True),
object_parameters=settings.get(section, 'AWS_S3_OBJECT_PARAMETERS', fallback={}),
auto_create_bucket=settings.getboolean(section, 'AWS_AUTO_CREATE_BUCKET', fallback=False),
default_acl=settings.get(section, 'AWS_DEFAULT_ACL', fallback=None),
bucket_acl=settings.get(
section,
'AWS_BUCKET_ACL',
fallback=settings.get(section, 'AWS_DEFAULT_ACL', fallback=None),
),
querystring_auth=settings.getboolean(section, 'AWS_QUERYSTRING_AUTH', fallback=False),
querystring_expire=settings.get(section, 'AWS_QUERYSTRING_EXPIRE', fallback=604800),
reduced_redundancy=settings.getboolean(section, 'AWS_REDUCED_REDUNDANCY', fallback=False),
location=settings.get(section, 'AWS_LOCATION', fallback=''),
encryption=settings.getboolean(section, 'AWS_S3_ENCRYPTION', fallback=False),
security_token=settings.get(section, 'AWS_SECURITY_TOKEN', fallback=None),
secure_urls=settings.getboolean(section, 'AWS_S3_SECURE_URLS', fallback=True),
file_name_charset=settings.get(section, 'AWS_S3_FILE_NAME_CHARSET', fallback='utf-8'),
gzip=settings.getboolean(section, 'AWS_IS_GZIPPED', fallback=False),
preload_metadata=settings.getboolean(section, 'AWS_PRELOAD_METADATA', fallback=False),
url_protocol=settings.get(section, 'AWS_S3_URL_PROTOCOL', fallback='http:'),
region_name=settings.get(section, 'AWS_S3_REGION_NAME', fallback=None),
use_ssl=settings.getboolean(section, 'AWS_S3_USE_SSL', fallback=True),
verify=settings.get(section, 'AWS_S3_VERIFY', fallback=None),
max_memory_size=settings.get(section, 'AWS_S3_MAX_MEMORY_SIZE', fallback=0),
shared_bucket=settings.getboolean(section, 'AWS_SHARED_BUCKET', fallback=False),
aws_log_level=settings.get(section, 'AWS_LOG_LEVEL', fallback=''),
gzip_content_types=settings.get(section, 'GZIP_CONTENT_TYPES', fallback=(
'text/css',
'text/javascript',
'application/javascript',
'application/x-javascript',
'image/svg+xml',
)),
cache_dir=settings.get(section, 'CACHE_DIR', fallback='/tmp/data-cache'),

root_dir=settings.get(section, 'ROOT_DIR', fallback=""),
account_name=settings.get(section, 'AZURE_ACCOUNT_NAME'),
account_key=settings.get(section, 'AZURE_ACCOUNT_KEY'),
azure_container=settings.get(section, 'AZURE_CONTAINER'),
location=settings.get(section, 'AZURE_LOCATION', fallback=''),
connection_string=settings.get(section, 'AZURE_CONNECTION_STRING', fallback=None),
shared_container=settings.get(section, 'AZURE_SHARED_CONTAINER', fallback=True),
azure_ssl=settings.get(section, 'AZURE_SSL', fallback=True),
upload_max_conn=settings.get(section, 'AZURE_UPLOAD_MAX_CONN', fallback=2),
timeout=settings.get(section, 'AZURE_CONNECTION_TIMEOUT_SECS', fallback=20),
max_memory_size=settings.get(section, 'AZURE_BLOB_MAX_MEMORY_SIZE', fallback=2 * 1024 * 1024),
expiration_secs=settings.get(section, 'AZURE_URL_EXPIRATION_SECS', fallback=None),
overwrite_files=settings.get(section, 'AZURE_OVERWRITE_FILES', fallback=True),
default_content_type=settings.get(section, 'AZURE_DEFAULT_CONTENT', fallback='application/octet-stream'),
cache_control=settings.get(section, 'AZURE_CACHE_CONTROL', fallback=None),
sas_token=settings.get(section, 'AZURE_SAS_TOKEN', fallback=None),
custom_domain=settings.get(section, 'AZURE_CUSTOM_DOMAIN', fallback=None),
token_credential=settings.get(section, 'AZURE_TOKEN_CREDENTIAL', fallback=None),
azure_log_level=settings.get(section, 'AWS_LOG_LEVEL', fallback=logging.ERROR),
cache_dir=settings.get(section, 'CACHE_DIR', fallback='/tmp/data-cache'),
endpoint_url=settings.get(section, 'ENDPOINT_URL', fallback=None),

Example

OASIS_WORKER_MODEL_DATA_STORAGE_TYPE: S3
OASIS_WORKER_MODEL_DATA_AWS_BUCKET_NAME: oasislmf-model-library-oasis-piwind
OASIS_WORKER_MODEL_DATA_AWS_ACCESS_KEY_ID:  {_ID_}
OASIS_WORKER_MODEL_DATA_AWS_SECRET_ACCESS_KEY:  {_KEY_}
OASIS_WORKER_MODEL_DATA_ROOT_DIR: model_data/

When set on a worker container, these are used to create a model_storage.json file which is then passed into the execution command so model data is pulled directly from OASIS_WORKER_MODEL_DATA_AWS_BUCKET_NAME

jamesoutterside and others added 30 commits July 24, 2023 15:02
* Readded sql endpoint

* Split raw files for sql processing. Amend sql view to support both.

* Added tests for output api.

* Update reader and block endpoints when reader does not support SQL.
* Readded sql endpoint

* update keycloak

* Set version 2.2.0

* Updated Package Requirements: django==3.2.20

* retest

* Update changelog

* Feautre/1323 reorganize branches plat2 (#849)

* Update CI plat2

* Update readme title

* Fix piwind branch select

* Updated Package Requirements: pyYaml==5.3.1

* fix

* test CI workflow without CVE error

---------

Co-authored-by: awsbuild <[email protected]>

* Fix/migrations plat1 to plat2 (#862)

* nuke all current migration files

* Add in platform 1 migrations (version 1.28.0)

* Apply platform2 migrations ontop of plat1

* Move all of ssl connection string to variable

* Update deploy script

* test data retention -- testing only

* Add helper script to support 2.2.0 and below

* Add support to migration between plat2 versions

* f

* fix

* tidy

* Revert "test data retention -- testing only"

This reverts commit a64e0a8.

* Updated Package Requirements: pyyaml==6.0.1

* trigger retest

* stricter checking for missing migrations

Revert "Revert "test data retention -- testing only""

This reverts commit ecf766d.

fix

Revert "Revert "Revert "test data retention -- testing only"""

This reverts commit 8e4474be5a11869571d10b31ab0ca7b6462e6988.

---------

Co-authored-by: awsbuild <[email protected]>

* Fix to tag piwind repo on publish (#865)

* Fix to tag piwind repo on publish

Extract prev versions for latest released worker

Disable guards to test release script

Set version 2.2.1rc2

Add option to set min CVE errors

Fix

fix

Fix ktools tag and latest publish

Fix boolean

Update changelog

Update changelog

test push git tag

Update changelog

test piwind tag from remote workflow

test

Revert "test"

This reverts commit 82f8ae7.

f

finish pub script

* Revert files edited in release testing

* switch piwind tag to main branch

* Fix cryptography CVE-2023-38325 - platform 2 (#873)

* Updated Package Requirements: cryptography==41.0.2 autobahn pyopenssl

* Updated Package Requirements: certifi==2023.7.22

* retest

---------

Co-authored-by: awsbuild <[email protected]>

---------

Co-authored-by: Sam Gamble <[email protected]>
Co-authored-by: awsbuild <[email protected]>
Co-authored-by: sambles <[email protected]>
Co-authored-by: Dan Bate <[email protected]>
@sambles
Copy link
Contributor Author

sambles commented Apr 16, 2024

The error retry: Retry in 6s: OasisException("Could not find events data file: ['events_p.bin', 'events_p.bin']") is caused by invalid bucket credentials. This needs to be checked and caught before preparing the model's run dir:

E.g.

(Pdb) model_storage.listdir()
*** PermissionError: The request signature we calculated does not match the signature you provided. Check your key and signing method.

@sambles
Copy link
Contributor Author

sambles commented May 1, 2024

[2024-05-01 09:17:57,658: INFO/ForkPoolWorker-1] RUNNING: oasislmf.execution.bin.prepare_run_directory
[2024-05-01 09:17:57,659: WARNING/ForkPoolWorker-1] 
  0%|          | 0/1 [00:00<?, ?it/s]
[2024-05-01 09:17:57,659: INFO/ForkPoolWorker-1] run_analysis[1accf3a6-fd79-49d8-94a3-7920ed6e7366]: Exception: <class 'oasis_data_manager.errors.OasisException'>: Error preparing the 'run' directory: '/tmp/tmpa4zrb8bx/input/oasis' and '/tmp/tmpa4zrb8bx/input/oasis' are the same file
[2024-05-01 09:17:57,711: INFO/ForkPoolWorker-1] Store file: /var/log/oasis/tasks/analysis_1_1accf3a6-fd79-49d8-94a3-7920ed6e7366.log -> d9c7e404160744949b43a2db39f376d8.log

This is caused by setting OASIS_AWS_LOCATION on the storage manager

@sambles sambles added the feature A main feature, captured on the backlog label May 2, 2024
@sambles sambles merged commit 0250a94 into main May 2, 2024
26 checks passed
@sambles sambles linked an issue May 2, 2024 that may be closed by this pull request
@sambles sambles deleted the feature/lot3 branch August 10, 2024 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A main feature, captured on the backlog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lot3 - Load model data from object storage
4 participants