Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

344 use dagster embedded elt to sync illuminate data #2157

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e3a8d49
feat: dlt embedded elt
cbini Jul 23, 2024
d7efba8
style: trunk
cbini Jul 23, 2024
b5e99e9
build: fix psycopg2-binary
cbini Jul 23, 2024
29fb4f8
build: rm bq creds
cbini Jul 23, 2024
92cf49f
Merge remote-tracking branch 'origin/main' into 344-use-dagster-embed…
cbini Jul 24, 2024
63ac5df
feat: sling resource
cbini Jul 24, 2024
b4c2f2c
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 8, 2024
443bf3d
fix: lint
cbini Nov 8, 2024
c39099c
feat: change table
cbini Nov 8, 2024
aa69efe
reset dlt project
cbini Nov 8, 2024
033918f
feat: dlt
cbini Nov 12, 2024
901dbf3
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 13, 2024
54e1898
fix: missing vars
cbini Nov 13, 2024
3a35526
fix: add dlt creds
cbini Nov 13, 2024
6de5e69
defer_table_reflect
cbini Nov 13, 2024
924e994
build: try credentials file
cbini Nov 14, 2024
9d6161a
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 14, 2024
b1440e5
refactor: dlt creds as dagster objects
cbini Nov 14, 2024
6546d1b
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 14, 2024
ae23039
refactor: dlt bq creds
cbini Nov 14, 2024
16458e8
build: fix secret
cbini Nov 14, 2024
b9ad989
build: cleanup
cbini Nov 14, 2024
f4627cd
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 14, 2024
6c51b94
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 15, 2024
f00a413
build: deps
cbini Nov 15, 2024
3497204
fix: resources
cbini Nov 19, 2024
9a98d93
Merge branch 'main' into 344-use-dagster-embedded-elt-to-sync-illumin…
cbini Nov 19, 2024
d68baac
build: deps
cbini Nov 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .devcontainer/scripts/postCreate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ op inject -f --in-file=.devcontainer/tpl/dbt_cloud.yml.tpl \
--out-file=env/dbt_cloud.yml &&
sudo mv -f env/dbt_cloud.yml /home/vscode/.dbt/dbt_cloud.yml

op inject -f --in-file=.devcontainer/tpl/gcloud_teamster_dlt_keyfile.json.tpl \
--out-file=env/gcloud_teamster_dlt_keyfile.json &&
sudo mv -f env/gcloud_teamster_dlt_keyfile.json /etc/secret-volume/gcloud_teamster_dlt_keyfile.json

# install pdm dependencies
pdm install --frozen-lockfile

Expand Down
9 changes: 9 additions & 0 deletions .devcontainer/tpl/.env.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ DEANSLIST_SFTP_HOST=op://Data Team/DeansList SFTP/host
DEANSLIST_SFTP_PASSWORD=op://Data Team/DeansList SFTP/password
DEANSLIST_SFTP_USERNAME=op://Data Team/DeansList SFTP/username
DEANSLIST_SUBDOMAIN=op://Data Team/DeansList API/subdomain
DESTINATION__BIGQUERY__CREDENTIALS__CLIENT_EMAIL=op://Data Team/Teamster Service Account - dlt/username
DESTINATION__BIGQUERY__CREDENTIALS__PRIVATE_KEY=op://Data Team/Teamster Service Account - dlt/password
DESTINATION__BIGQUERY__CREDENTIALS__PROJECT_ID=op://Data Team/Teamster Service Account - dlt/project id
EDPLAN_SFTP_HOST=op://Data Team/edplan SFTP - Camden/host
EDPLAN_SFTP_PASSWORD_KIPPCAMDEN=op://Data Team/edplan SFTP - Camden/password
EDPLAN_SFTP_PASSWORD_KIPPNEWARK=op://Data Team/edplan SFTP - Newark/password
Expand Down Expand Up @@ -126,6 +129,12 @@ SCHOOLMINT_GROW_CLIENT_SECRET=op://Data Team/SchooMint Grow API/client secret
SCHOOLMINT_GROW_DISTRICT_ID=op://Data Team/SchooMint Grow API/district id
SLACK_TOKEN=op://Data Team/Slack API - Teamster/credential
SMARTRECRUITERS_SMARTTOKEN=op://Data Team/SmartRecruiters API/smart token
SOURCES__SQL_DATABASE__CREDENTIALS__DATABASE=op://Data Team/Illuminate ODBC/database
SOURCES__SQL_DATABASE__CREDENTIALS__DRIVERNAME=op://Data Team/Illuminate ODBC/driver
SOURCES__SQL_DATABASE__CREDENTIALS__HOST=op://Data Team/Illuminate ODBC/ip
SOURCES__SQL_DATABASE__CREDENTIALS__PASSWORD=op://Data Team/Illuminate ODBC/password
SOURCES__SQL_DATABASE__CREDENTIALS__PORT=op://Data Team/Illuminate ODBC/port
SOURCES__SQL_DATABASE__CREDENTIALS__USERNAME=op://Data Team/Illuminate ODBC/username
TABLEAU_PERSONAL_ACCESS_TOKEN=op://Data Team/Tableau Server PAT - Dagster/credential
TABLEAU_SERVER_ADDRESS=op://Data Team/Tableau Server PAT - Dagster/hostname
TABLEAU_SITE_ID=op://Data Team/Tableau Server PAT - Dagster/site id
Expand Down
1 change: 1 addition & 0 deletions .devcontainer/tpl/gcloud_teamster_dlt_keyfile.json.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
op://Data Team/Teamster Service Account - dlt/keyfile.json
7 changes: 7 additions & 0 deletions .trunk/trunk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ lint:
linters:
- sqlfluff
- sqlfmt
- paths:
- src/teamster/libraries/dlt_sources/**
linters:
- pyright
- sqlfluff
- sqlfmt
- paths:
- src/dbt/**
linters:
Expand All @@ -52,6 +58,7 @@ lint:
- markdownlint
- osv-scanner
- oxipng
- pyright
- ruff
- shellcheck
- shfmt
Expand Down
724 changes: 618 additions & 106 deletions pdm.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ dependencies = [
"dagster-k8s",
"dagster-slack",
"dagster-ssh",
"dagster-embedded-elt",
"dbt-bigquery>=1.8,<1.9",
"dbt-core>=1.8,<1.9",
"beautifulsoup4>=4.12.2",
Expand All @@ -24,6 +25,7 @@ dependencies = [
"gspread>=5.12.0",
"ldap3>=2.9.1",
"oracledb>=1.4.2",
"psycopg[binary,pool]>=3.2.3",
"py-avro-schema>=3.4.1",
"pycryptodome>=3.19.0",
"pypdf>=5.0.0",
Expand Down
49 changes: 42 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@
# Please do not edit it manually.

agate==1.9.1
aiohappyeyeballs==2.4.3
aiohttp==3.11.5
aiosignal==1.3.1
alembic==1.14.0
annotated-types==0.7.0
astunparse==1.6.3
attrs==24.2.0
avro==1.12.0
babel==2.16.0
Expand All @@ -26,6 +30,7 @@ dagster-airbyte==0.25.2
dagster-cloud==1.9.2
dagster-cloud-cli==1.9.2
dagster-dbt==0.25.2
dagster-embedded-elt==0.25.2
dagster-fivetran==0.25.2
dagster-gcp==0.25.2
dagster-k8s==0.25.2
Expand All @@ -42,12 +47,17 @@ dbt-extractor==0.5.1
dbt-semantic-interfaces==0.5.1
deepdiff==7.0.1
defusedxml==0.7.1
dlt==1.4.0
docstring-parser==0.16
durationpy==0.9
fastavro==1.9.7
filelock==3.16.1
fsspec==2024.10.0; python_version >= "3.12"
frozenlist==1.5.0
fsspec==2024.10.0
gitdb==4.0.11
github3-py==4.0.1
gitpython==3.1.43
giturlparse==0.12.0
google-api-core[grpc]==2.23.0
google-api-python-client==2.153.0
google-auth==2.36.0
Expand All @@ -62,23 +72,27 @@ google-resumable-media==2.7.2
googleapis-common-protos[grpc]==1.66.0
greenlet==3.1.1; (platform_machine == "win32" or platform_machine == "WIN32" or platform_machine == "AMD64" or platform_machine == "amd64" or platform_machine == "x86_64" or platform_machine == "ppc64le" or platform_machine == "aarch64") and python_version < "3.13"
grpc-google-iam-v1==0.13.1
grpcio==1.67.1
grpcio==1.68.0
grpcio-health-checking==1.62.3
grpcio-status==1.62.3
gspread==6.1.4
hexbytes==1.2.1
httplib2==0.22.0
humanfriendly==10.0
humanize==4.11.0
idna==3.10
importlib-metadata==6.11.0
isodate==0.6.1
jinja2==3.1.4
joblib==1.4.2
jsonpath-ng==1.7.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kubernetes==31.0.0
ldap3==2.9.1
leather==0.4.0
logbook==1.5.3
makefun==1.15.6
mako==1.3.6
markdown-it-py==3.0.0
markupsafe==3.0.2
Expand All @@ -88,6 +102,7 @@ memoization==0.4.0
minimal-snowplow-tracker==0.0.2
more-itertools==10.5.0
msgpack==1.1.0
multidict==6.1.0
networkx==3.4.2
numpy==2.1.3
oauth2client==4.1.3
Expand All @@ -100,11 +115,19 @@ pandas==2.2.3
paramiko==3.5.0
parsedatetime==2.6
pathspec==0.12.1
pathvalidate==3.2.1
pendulum==3.0.0
pex==2.24.1
pluggy==1.5.0
ply==3.11
prompt-toolkit==3.0.36
propcache==0.2.0
proto-plus==1.25.0
protobuf==4.25.5
psutil==6.1.0; platform_system == "Windows"
psycopg-binary==3.2.3; implementation_name != "pypy"
psycopg-pool==3.2.3
psycopg[binary,pool]==3.2.3
py-avro-schema==3.8.2
pyarrow==18.0.0
pyasn1==0.6.1
Expand All @@ -114,7 +137,7 @@ pycryptodome==3.21.0
pydantic==2.9.2
pydantic-core==2.23.4
pygments==2.18.0
pyjwt[crypto]==2.9.0
pyjwt[crypto]==2.10.0
pynacl==1.5.0
pyparsing==3.2.0; python_version > "3.0"
pypdf==5.1.0
Expand All @@ -130,19 +153,25 @@ questionary==2.0.1
referencing==0.35.1
requests==2.32.3
requests-oauthlib==2.0.0
requirements-parser==0.11.0
rich==13.9.4
rpds-py==0.21.0
rsa==4.9
scikit-learn==1.5.2
scipy==1.14.1
semver==3.0.2
setuptools==75.5.0
shellingham==1.5.4
simplejson==3.19.3
six==1.16.0
slack-sdk==3.33.3
slack-sdk==3.33.4
sling==1.2.22
sling-linux-amd64==1.2.22
smmap==5.0.1
soupsieve==2.6
sqlalchemy==2.0.36
sqlglot[rs]==25.30.0
sqlglotrs==0.2.13
sqlglot[rs]==25.31.4
sqlglotrs==0.2.14
sqlparse==0.5.2
sshtunnel==0.4.0
structlog==24.4.0
Expand All @@ -151,11 +180,14 @@ tabulate==0.9.0
tenacity==9.0.0
text-unidecode==1.3
threadpoolctl==3.5.0
time-machine==2.16.0; implementation_name != "pypy"
tomli==2.1.0
tomlkit==0.13.2
toposort==1.10
tqdm==4.67.0
typeguard==4.4.1
typer==0.13.0
typer==0.13.1
types-setuptools==75.5.0.20241121
typing-extensions==4.12.2
tzdata==2024.2
universal-pathlib==0.2.5; python_version >= "3.12"
Expand All @@ -164,4 +196,7 @@ urllib3==2.2.3
watchdog==5.0.3
wcwidth==0.2.13
websocket-client==1.8.0
wheel==0.45.0
win-precise-time==1.4.2; os_name == "nt"
yarl==1.17.2
zipp==3.21.0
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@
partitions_def=DEANSLIST_FISCAL_MULTI_PARTITIONS_DEF,
op_tags={
"dagster-k8s/config": {
"container_config": {"resources": {"limits": {"memory": "4.5Gi"}}}
"container_config": {"resources": {"limits": {"memory": "5.0Gi"}}}
}
},
)
Expand Down
55 changes: 55 additions & 0 deletions src/teamster/code_locations/kipptaf/dagster-cloud.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ locations:
items:
- key: service_account_gserviceaccount.json
path: gcloud_service_account_json
- secret:
name: op-gcp-service-account-dlt
items:
- key: keyfile.json
path: gcloud_teamster_dlt_keyfile.json
server_k8s_config:
container_config:
env:
Expand Down Expand Up @@ -374,6 +379,31 @@ locations:
secretKeyRef:
name: op-slack-api
key: credential
- name: ILLUMINATE_DB_DRIVERNAME
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: driver
- name: ILLUMINATE_DB_HOST
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: ip
- name: ILLUMINATE_DB_DATABASE
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: database
- name: ILLUMINATE_DB_USERNAME
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: username
- name: ILLUMINATE_DB_PASSWORD
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: password
run_k8s_config:
container_config:
env:
Expand Down Expand Up @@ -707,3 +737,28 @@ locations:
secretKeyRef:
name: op-slack-api
key: credential
- name: ILLUMINATE_DB_DRIVERNAME
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: driver
- name: ILLUMINATE_DB_HOST
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: ip
- name: ILLUMINATE_DB_DATABASE
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: database
- name: ILLUMINATE_DB_USERNAME
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: username
- name: ILLUMINATE_DB_PASSWORD
valueFrom:
secretKeyRef:
name: op-illuminate-db
key: password
4 changes: 4 additions & 0 deletions src/teamster/code_locations/kipptaf/definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
build_sensor_for_freshness_checks,
load_assets_from_modules,
)
from dagster_embedded_elt.dlt import DagsterDltResource
from dagster_k8s import k8s_job_executor

from teamster.code_locations.kipptaf import (
Expand All @@ -19,6 +20,7 @@
deanslist,
extracts,
fivetran,
illuminate,
ldap,
overgrad,
performance_management,
Expand Down Expand Up @@ -54,6 +56,7 @@
amplify,
extracts,
deanslist,
illuminate,
ldap,
overgrad,
performance_management,
Expand Down Expand Up @@ -97,6 +100,7 @@
"db_bigquery": BIGQUERY_RESOURCE,
"dbt_cli": get_dbt_cli_resource(DBT_PROJECT),
"dds": resources.DIBELS_DATA_SYSTEM_RESOURCE,
"dlt": DagsterDltResource(),
"fivetran": resources.FIVETRAN_RESOURCE,
"gcs": GCS_RESOURCE,
"google_directory": resources.GOOGLE_DIRECTORY_RESOURCE,
Expand Down
5 changes: 5 additions & 0 deletions src/teamster/code_locations/kipptaf/illuminate/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from teamster.code_locations.kipptaf.illuminate.assets import assets

__all__ = [
"assets",
]
45 changes: 45 additions & 0 deletions src/teamster/code_locations/kipptaf/illuminate/assets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import json

from dagster import AssetExecutionContext, EnvVar, _check
from dagster_embedded_elt.dlt import DagsterDltResource, dlt_assets
from dlt import pipeline
from dlt.sources.sql_database import sql_database
from sqlalchemy import URL, create_engine


@dlt_assets(
dlt_source=sql_database(
credentials=create_engine(
url=URL.create(
drivername=_check.not_none(
value=EnvVar("ILLUMINATE_DB_DRIVERNAME").get_value()
),
host=EnvVar("ILLUMINATE_DB_HOST").get_value(),
database=EnvVar("ILLUMINATE_DB_DATABASE").get_value(),
username=EnvVar("ILLUMINATE_DB_USERNAME").get_value(),
password=EnvVar("ILLUMINATE_DB_PASSWORD").get_value(),
)
),
schema="dna_assessments",
table_names=["assessments", "agg_student_responses_standard"],
defer_table_reflect=True,
),
dlt_pipeline=pipeline(
pipeline_name="illuminate",
destination="bigquery",
dataset_name="dlt_illuminate_dna_assessments",
progress="log",
),
)
def illuminate_dna_assessments(context: AssetExecutionContext, dlt: DagsterDltResource):
yield from dlt.run(
context=context,
credentials=json.load(
fp=open(file="/etc/secret-volume/gcloud_teamster_dlt_keyfile.json")
),
)


assets = [
illuminate_dna_assessments,
]
Loading
Loading