Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two chunk instances downloading the same file #36

Open
danielfireman opened this issue Nov 24, 2022 · 3 comments
Open

Two chunk instances downloading the same file #36

danielfireman opened this issue Nov 24, 2022 · 3 comments
Labels
question Further information is requested

Comments

@danielfireman
Copy link
Collaborator

danielfireman commented Nov 24, 2022

Do we care about the same file being downloaded simultaneously by two chunk instances? Asking because I believe this can a pre-release feature.

Example:

$ ./chunk HTTP://a.b/c &
$ ./chunk HTTP://a.b/c &

The result of this sequence of operations is unknown.

We could deal with it after #8, by augmenting the infrastructure in place.

cc/ @cuducos

@danielfireman danielfireman added the question Further information is requested label Nov 24, 2022
@cuducos
Copy link
Owner

cuducos commented Nov 25, 2022

With the changes from #38 we started to have a more predictable scenario here.

This would still be a scenario we wouldn't be prepared for:

$ ./chunk HTTP://a.b/c &
$ ./chunk HTTP://a.b/c &

But this would be OK:

$ ./chunk HTTP://a.b/c &
$ cd another-directory && ./chunk HTTP://a.b/c &

What would happen here is that the file c would be downloaded twice since the local path would be different.

In my opinion, two processing downloading to the same local path should not be allowed (classic concurrency problem both on the downloaded file and on the progress file). I am just not sure about how to control and prevent that.

@cuducos
Copy link
Owner

cuducos commented Nov 25, 2022

If #38 is ok, what about we create a ~/.chunk/lock file to only allow one instance of chunk running at a time?

@danielfireman
Copy link
Collaborator Author

danielfireman commented Nov 26, 2022

If #38 is ok, what about we create a ~/.chunk/lock file to only allow one instance of chunk running at a time?

It is possible, but I believe we don't need to make it too restrictive. Bellow, I try to do some brainstorming about a less restrictive option: consider each progress management file a lock file.

That has the same complexity as creating a single lock file but allows us to have as many chunk instances as needed, as long as they do not download the same file.

That said, there is one faulty case missing that we should treat either way: what happens when the chunk instance crashes?

There is a straightforward way to solve this problem:

  • keep and update the PID of the process in the progress management file
  • keep and update the timestamp of the last activity (chunk download)

With that, every time a download is started, chunk should do as follows:

  1. verify if the progress manager file is there
    1. if it is, check the owner PID (if it is the owner, proceed) and the last activity (if it is too long ago, become the owner and proceed)
    2. otherwise, create a progress manager file as usual and proceed

Of course, we could introduce the --force-start flag in the future, which guarantees the current chunk instance rules them all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants