Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support persisting the LoadMode.VIRTUALENV directory #1079

Merged
merged 55 commits into from
Aug 16, 2024

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jul 5, 2024

Description

Added virtualenv_dir as an option to ExecutionConfig which is then propagated downstream to DbtVirtualenvBaseOperator.

The following now happens:

  • If the flag is set, the operator will attempt to locate the venv's python binary under the provided virtualenv_dir.
    • If so, it will conclude that the venv exists and continues without creating a new one.
    • If not, it will create a new one at virtualenv_dir
  • If the flag is not set, simply continue using the temporary directory solution that was already in place.

Impact

A very basic test using a local docker compose set-up as per the contribution guide and the example_virtualenv DAG saw the DAG's runtime go down from 2m31s to just 32s. I'd this improvement to be even more noticeable with more complex graphs and more python requirements.

Related Issue(s)

Closes: #610
Follow up ticket: #1157

Breaking Change?

None, the flag is optional and is ignored (with a warning) when used outside of VirtualEnv execution mode.

Important notice

Most of the changes in this PR were originally implemented in PR #611 by @LennartKloppenburg. It became stale over the last few months due to limited maintainer availability. Our sincere apologies to the original author.

What was accomplished since:

  1. Rebased
  2. Fixed conflicts
  3. Fixed failing tests
  4. Introduced new tests

Co-authored-by: Lennart Kloppenburg [email protected]

Copy link

netlify bot commented Jul 5, 2024

Deploy Preview for sunny-pastelito-5ecb04 ready!

Name Link
🔨 Latest commit f4d0c3a
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66bf33c3e175330008818b54
😎 Deploy Preview https://deploy-preview-1079--sunny-pastelito-5ecb04.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jul 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.35%. Comparing base (41053ed) to head (f4d0c3a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1079      +/-   ##
==========================================
+ Coverage   91.05%   96.35%   +5.29%     
==========================================
  Files          64       64              
  Lines        3568     3568              
==========================================
+ Hits         3249     3438     +189     
+ Misses        319      130     -189     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana added this to the Cosmos 1.6.0 milestone Jul 8, 2024
.github/workflows/test.yml Outdated Show resolved Hide resolved
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 14, 2024
Copy link
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delayed review, missed reviewing this earlier

looks great to me mostly, some questions inline

cosmos/converter.py Show resolved Hide resolved
cosmos/operators/virtualenv.py Outdated Show resolved Hide resolved
cosmos/operators/virtualenv.py Outdated Show resolved Hide resolved
cosmos/operators/virtualenv.py Outdated Show resolved Hide resolved
cosmos/operators/virtualenv.py Show resolved Hide resolved
cosmos/operators/virtualenv.py Outdated Show resolved Hide resolved
@tatiana tatiana merged commit 4273d99 into main Aug 16, 2024
60 checks passed
@tatiana tatiana deleted the feature/cache-virtualenv branch August 16, 2024 11:13
@pankajkoti pankajkoti mentioned this pull request Aug 16, 2024
pankajkoti added a commit that referenced this pull request Aug 20, 2024
New Features

* Add support for loading manifest from cloud stores using Airflow
Object Storage by @pankajkoti in #1109
* Cache ``package-lock.yml`` file by @pankajastro in #1086
* Support persisting the ``LoadMode.VIRTUALENV`` directory by @tatiana
in #1079
* Add support to store and fetch ``dbt ls`` cache in remote stores by
@pankajkoti in #1147
* Add default source nodes rendering by @arojasb3 in #1107
* Add Teradata ``ProfileMapping`` by @sc250072 in #1077

Enhancements

* Add ``DatabricksOauthProfileMapping`` profile by @CorsettiS in #1091
* Use ``dbt ls`` as the default parser when ``profile_config`` is
provided by @pankajastro in #1101
* Add task owner to dbt operators by @wornjs in #1082
* Extend Cosmos custom selector to support + when using paths and tags
by @mvictoria in #1150
* Simplify logging by @dwreeves in #1108

Bug fixes

* Fix Teradata ``ProfileMapping`` target invalid issue by @sc250072 in
#1088
* Fix empty tag in case of custom parser by @pankajastro in #1100
* Fix ``dbt deps`` of ``LoadMode.DBT_LS`` should use
``ProjectConfig.dbt_vars`` by @tatiana in #1114
* Fix import handling by lazy loading hooks introduced in PR #1109 by
@dwreeves in #1132
* Fix Airflow 2.10 regression and add Airflow 2.10 in test matrix by
@pankajastro in #1162

Docs

* Fix typo in azure-container-instance docs by @pankajastro in #1106
* Use Airflow trademark as it has been registered by @pankajastro in
#1105

Others

* Run some example DAGs in Kubernetes execution mode in CI by
@pankajastro in #1127
* Install requirements.txt by default during dev env spin up by
@@CorsettiS in #1099
* Remove ``DbtGraph.current_version`` dead code by @tatiana in #1111
* Disable test for Airflow-2.5 and Python-3.11 combination in CI by
@pankajastro in #1124
* Pre-commit hook updates in #1074, #1113, #1125, #1144, #1154,  #1167

---------

Co-authored-by: Pankaj Koti <[email protected]>
Co-authored-by: Pankaj Singh <[email protected]>
tatiana added a commit that referenced this pull request Aug 23, 2024
Add missing `virtualenv_dir` param introduced in PR #1079 
to the `Execution Config` docs. Also say that it's added in v1.6

closes: #1172

Co-authored-by: Tatiana Al-Chueyr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc execution:virtualenv Related to Virtualenv execution environment lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support caching virtualenvs created when using ExecutionMode.VIRTUALENV
7 participants