-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for git-sync with PVC (central sync deployment) #575
base: main
Are you sure you want to change the base?
feat: add support for git-sync with PVC (central sync deployment) #575
Conversation
The more that I think about this, the more it doesn't make sense to try to keep three strategies to copy the files tbh. Currently, if this PR or sth along the way goes forward, we'll have 3 ways of doing this:
for the people that are currently using the first option, replacing it with number 3 doesn't really make any difference, with a small change though: if those people have DAGs that are creating files on the local volumes and whatever, that means that if we mount the shared volume directly on the pods, they might write stuff that are propagated to other pods as well, whereas in the status quo with git sync this never happens; therefore, rolling out option 3 would be a breaking change. here's what I propose instead:
if we manage to do it this way, it'd mean that:
@thesuperzapper what do you think? I can try to advance this PR towards this way if you'd be up guiding me a bit. |
Signed-off-by: Burak Karakan <[email protected]>
Signed-off-by: Burak Karakan <[email protected]>
…tsync Signed-off-by: Burak Karakan <[email protected]>
…enabled Signed-off-by: Burak Karakan <[email protected]>
Signed-off-by: Burak Karakan <[email protected]>
Signed-off-by: Burak Karakan <[email protected]>
feb6191
to
3512922
Compare
…loyment Signed-off-by: Burak Karakan <[email protected]>
I went ahead and attempted doing the change with the copy trick, but there are a couple of things not working still:
I'll try to work on this soon again, but leaving in case you can tackle it. |
Signed-off-by: Burak Karakan <[email protected]>
Signed-off-by: Burak Karakan <[email protected]>
Signed-off-by: Burak Karakan <[email protected]>
I think I have managed to get it working: the contents are being properly mounted, scheduler is able to see and schedule tasks as well. My local setup is not that good, so I am not able to test the whole flow, but as far as these changes are concerned, I think it is in an okay shape. the things missing are:
any feedback is appreciated. |
Another usecase for sth like this:
For processes like gitSync where maybe 90% of the fetch operations do not change anything, I'd like to be able to set some sort of a failure strategy for the gitSync operation so that it doesn't bring Airflow down for everyone, and work with the stale copy of the DAGs where necessary. Having a separate deployment allows doing that. |
Any news on this PR ? |
I think it is ready for review. |
What issues does your PR fix?
What does your PR do?
This PR is a quick attempt to bring a single git-sync pod that would update the dags on a PVC, and the rest of the pods can just mount this diirectory instead of cloning the repos over and over again. I am not sure if this is the right way to go btw, I am open to feedback, and I haven really tested this yet.
There is a very good chance that this is a completely wrong way of implementing this, in which case I'd be totally okay to close the PR.
Checklist
For all Pull Requests
For releasing ONLY