Skip to content

Commit

Permalink
FE-349 data clean up for hca ingest manifest.py (#340)
Browse files Browse the repository at this point in the history
* Adding data sanitizing - strip white space from rows and make institutions upper case.
Also removed the load_hca partition creation - that pipeline is no longer used, and added TEST location for dev

* adding location of scala image for transparency

* many attempts to clear cache 

* NEVERMIND let's update poetry
python-poetry/poetry#3439

* putting my workflows back

* Update requirements.txt

---------

Co-authored-by: dsp-fieldeng-bot <[email protected]>
  • Loading branch information
bahill and dsp-fieldeng-bot authored Nov 15, 2024
1 parent a3f3572 commit 67a0629
Show file tree
Hide file tree
Showing 10 changed files with 18 additions and 14 deletions.
1 change: 1 addition & 0 deletions .github/workflows/build_and_publish_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
java-version: [email protected]
- name: Push Scala Dataflow Docker image
run: sbt publish
# us.gcr.io/broad-dsp-gcr-public/hca-transformation-pipeline
- name: Get artifact slug
id: get-artifact-slug
run: 'echo ::set-output name=slug::$(git rev-parse --short "$GITHUB_SHA")'
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/build_and_publish_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ jobs:
run: gcloud auth configure-docker --quiet us.gcr.io,us-east4-docker.pkg.dev
- name: Push Scala Dataflow Docker image
run: sbt publish
# us.gcr.io/broad-dsp-gcr-public/hca-transformation-pipeline
- name: Get artifact slug
id: get-artifact-slug
run: 'echo ::set-output name=slug::$(git rev-parse --short "$GITHUB_SHA")'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/generate-requirements-file.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
with:
python-version: 3.9.16
- name: Install Poetry
uses: snok/install-poetry@v1.2
uses: snok/install-poetry@v1
with:
version: 1.1.9
virtualenvs-create: true
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/validate_pull_request_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@ jobs:
with:
python-version: 3.9.16
- name: Install Poetry
uses: snok/install-poetry@v1.2
uses: snok/install-poetry@v1
with:
version: 1.1.9
version: 1.8.0
- name: Restore cache dependencies
uses: actions/cache@v2
env:
cache-name: cache-poetry-v2
with:
path: ~/.cache/pypoetry
# key uses pyproject.toml hash, so it's unique to each version of pyproject.toml
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('./orchestration/pyproject.toml') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/validate_python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
with:
python-version: 3.9.16
- name: Install Poetry
uses: snok/install-poetry@v1.2
uses: snok/install-poetry@v1
with:
version: 1.1.9
- name: Cache dependencies
Expand Down
2 changes: 1 addition & 1 deletion orchestration/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ ENV PYTHONFAULTHANDLER=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.1.8 \
POETRY_VERSION=1.1.9 \
SENTRY_DSN=https://[email protected]/4506559533088768

RUN pip install "poetry==$POETRY_VERSION"
Expand Down
6 changes: 3 additions & 3 deletions orchestration/hca_manage/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
"dev": {
"EBI": "gs://broad-dsp-monster-hca-dev-ebi-staging/dev",
"UCSC": "gs://broad-dsp-monster-hca-dev-ebi-staging/dev",
"TEST": "gs://broad-dsp-monster-hca-prod-ebi-storage/broad_test_dataset"
}
}
ENV_PIPELINE_ENDINGS = {
Expand Down Expand Up @@ -101,15 +102,15 @@ def _parse_csv(csv_path: str, env: str, project_id_only: bool = False,
continue

assert len(row) == 2
institution = row[0]
row = [x.strip() for x in row]
institution = row[0].upper()
project_id = find_project_id_in_str(row[1])

key = None
if project_id_only:
project_id = row[1]
key = project_id
else:
# TODO check for all caps - change to all caps if not, then match
if institution not in STAGING_AREA_BUCKETS[env]:
raise Exception(f"Unknown institution {institution} found")

Expand Down Expand Up @@ -178,7 +179,6 @@ def _enumerate_manifests(env: str) -> None:


def load(args: argparse.Namespace) -> None:
parse_and_load_manifest(args.env, args.csv_path, args.release_tag, "load_hca")
parse_and_load_manifest(args.env, args.csv_path, args.release_tag, "per_project_load_hca")
parse_and_load_manifest(args.env, args.csv_path, args.release_tag, "validate_ingress")
parse_and_load_manifest(
Expand Down
7 changes: 4 additions & 3 deletions orchestration/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions orchestration/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cffi = "1.16.0"
# TODO: we'll probably want to use just the dagster version here and not the API versions as well
# https://github.com/dagster-io/dagster/blob/master/MIGRATION.md#migrating-to-10
dagster = "0.12.14"
dagster-gcp = "^0.12.14"
dagster-gcp = "0.12.14"
dagster-k8s = "0.12.14"
dagster-postgres = "0.12.14"
dagster-slack = "0.12.14"
Expand Down Expand Up @@ -58,7 +58,7 @@ soft_delete = "hca_manage.soft_delete:run"
job = "hca_manage.job:fetch_job_info"

[build-system]
requires = ["poetry-core=^1.1.8"]
requires = ["poetry-core<=1.1.9"]
build-backend = "poetry.core.masonry.api"

[tool.autopep8]
Expand Down
2 changes: 1 addition & 1 deletion orchestration/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dagster==0.12.14
data-repo-client==1.542.0
docstring-parser==0.15; python_version >= "3.9" and python_version < "3.10"
frozenlist==1.4.0; python_version >= "3.9" and python_version < "3.10" and python_full_version >= "3.6.0"
google-api-core==2.19.0; python_version >= "3.9" and python_version < "3.10" and (python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10") and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
google-api-core==2.23.0; python_version >= "3.9" and python_version < "3.10" and (python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10") and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
google-api-python-client==1.12.11; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
google-auth-httplib2==0.1.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
google-auth==2.23.3; python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10"
Expand Down

0 comments on commit 67a0629

Please sign in to comment.