Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap compilation in a cross-os-process lock #13860
Wrap compilation in a cross-os-process lock #13860
Changes from 7 commits
2dd64c6
745b610
308ceae
48b6da6
7ca97ce
76a455e
90dd9d5
d870aa7
4f90a17
854e1b5
b9b56fc
a11d1c3
0ab81d7
624d56e
313868c
b76f795
18fc733
4ddd5b2
d9a6eea
162c42b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any other error happens here, we'll get a nasty
CaseClauseError
. Some of those are transient (:system_limit
?), some are not. Maybe we do a simple algorithm of:{:error, reason}
and we don't know if reason is transient or not (basically catch all), then sleep for like a second and go back to 1.With a max number of like 2 or 3 retries for now? I know that just rerunning
mix compile
would likely achieve the same, but it seems like a small price to pay for a potentially-better UX. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that if we fail to accept, it means the client fails to connect and assumes the socket is stale. So it's probably safer to crash the lock owner to avoid corruption? We could raise a more specific error to avoid
CaseClauseError
.When we reach
system_limit
on open files/sockets it usually blows everything up anyway? Are there any other transient errors you worry about? I'm looking at the list in linux manual and I'm not sure if it makes sense to retry any of them (other than EINTR).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The accepting process is sort of holding its own little lock, so this confuses me a bit:
The client would fail to connect with
:closed
if the socket is closed, not with other errors, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right, I think these are not tied. The client is put in the accept queue and at that point they connect successfully, whether
:gen_tcp.accept
grabs it from the queue is separate.But if the idea is to sleep and retry accepting, this heavily increases the chance that we don't send the probe data to the client on time, which is critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice trick!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work? Nitpick, because your version works too 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to remove all port_* files first, only then all lock_* files. Not sure if wildcard guarantees the order, but even if it does, I'd rather be explicit about the order of operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't guarantee that as far as I know. Why do we need to delete all port files first? Might be worth leaving a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: we actually want all files in the directory, so semantically there's no need for wildcard.
This part from the doc comment:
This is best seen on an example, so imagine this:
Process A has port 100 and owns lock_2.
Process B has port 200 and is paused right before
File.ln("port_200", "lock_1")
.Process A first moves port_100 to lock_0. And now wants to remove files.
Let's say process A were to remove lock_1 first. Then process B wakes up and successfully links. At this point Process B will overtake A. Contrarily, if we remove port_200 first, when process B wakes up, it will fail to link.
In other words, once we are sure we are the new owner, we remove port_* files to invalidate any peers attempting to also get the lock.
Does that make sense? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense. My head hurts, but it makes sense 😉 Let's maybe leave a comment here too saying that we need to delete these files in the specific order then, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the module comment already says they need to be deleted in order. There was also comment where
take_over
is called, but I moved it here, since it's more relevant here :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfecto!