-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vine: temp file replication #3564
Vine: temp file replication #3564
Conversation
Proof of concept |
Please describe the changes above in the nice form provided. :) |
Nice! Let me observe that "now" doesn't really mean "synchronously right away" but rather "without waiting for a dependency to require it". The transfer could still be delayed for reasons of capacity, concurrency limits at the worker, etc. A question: Should all url requests be done "now" ? Is there an upside to having "normal" transfers wait until something requests their presence? @BarrySlyDelgado what do you think? |
The answer is likely that it depends. I think it is mostly beneficial if you are not worried about resource utilization. For example, with limited disk space downloading files without premeditated necessity could fill up space quickly. However, without any worries about certain resources, if a worker is free to download a specific file from a remote source, it might as well do so as it could save start up time for task executions in the future. In regard to concurrency limitations, scenarios could arise in which a worker may be instructed to download a file "now" reaching a set concurrency limit and could possibly block a task from running though it may be a limited case. |
Had a successful large scale run where worker loss was handled without recovery tasks. Still need to refine further as it is not distributing the files evenly and definitely overloading specific workers, both as the sender and receiver. |
That's great! Balance is a tricky issue. This is not quite as simple as it first appears... |
Please rebase on master to get a workaround for the OSX build. |
04156bf
to
108e6b7
Compare
oops |
replicate your pr! |
Reopening after accidental hard reset :) |
284b021
to
b779e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on the right track, I know you are still looking at scheduling/placement issues.
Status on this one? |
File distribution is looking good. Still need to clean up the interface as was requested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last little bit -
I will go the flag route, since we have been discussing ensuring the cache on task startup and we may think of other ideas |
And finally make sure the setting is documented in |
Proposed changes
Related to #3563
At the moment, upon a cache update, the manager will check if a temp file was created.
If that is the case, the manager will look for up to q->temp_replica_count workers to replicate the file to.
It uses the new message,
puturl_now,
to tell a worker to bring a remote file to the cache without the need to run a taskPost-change actions
Put an 'x' in the boxes that describe post-change actions that you have done.
The more 'x' ticked, the faster your changes are accepted by maintainers.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.