-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't unwrap when removing invalid TCP state from fastpath. #621
Conversation
Now that the fastpath temporarily drops (and reacquires) the port lock when TCP state needs to be invalidated, we opened ourselves up to the possibility that another packet could have removed this state ahead of us. Equally, a packet could insert new TCP state which we might accidentally remove. This PR removes the unwrap on removal to account for the race, and only removes TCP flows if they are pointer-equal. Closes #618.
I sadly haven't been able to locally reproduce the crash itself, using either zone-to-zone traffic or between local VMs via standalone omicron. Both this PR and |
Maybe try this for many flows simultaneously with distinct address pairs? This would get a bit closer to rack traffic conditions. |
I've bumped up the topology a bit (all instances running omicron's builtin alpine linux):
No crashes as yet on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just left one suggestion for coalescing the flow state lookup+removal.
if let Some(found_entry) = local_lock.tcp_flows.get(ufid_out) { | ||
if Arc::ptr_eq(found_entry, &entry) { | ||
self.uft_tcp_closed(&mut local_lock, ufid_out, ufid_in); | ||
_ = local_lock.tcp_flows.remove(ufid_out); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's ok to swap the order here, you could make use of the Entry
API to lookup and remove the tcp flow to save on searching through it twice:
if let Some(found_entry) = local_lock.tcp_flows.get(ufid_out) { | |
if Arc::ptr_eq(found_entry, &entry) { | |
self.uft_tcp_closed(&mut local_lock, ufid_out, ufid_in); | |
_ = local_lock.tcp_flows.remove(ufid_out); | |
} | |
} | |
if let Entry::Occupied(found_entry) = local_lock.tcp_flows.entry(*ufid_out) { | |
if Arc::ptr_eq(found_entry.get(), &entry) { | |
_ = found_entry.remove_entry(); | |
self.uft_tcp_closed(&mut local_lock, ufid_out, ufid_in); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review Luqman.
I think this would be a good idea (the swap is valid as I read it), except local_lock.tcp_flows
here is a FlowTable
and not a BTreeMap
. It'd be nice to forward the Entry
API while respecting the capacity constraints in FlowTable::add
etc. to enable that. I'll open a ticket.
EDIT: #627.
Now that the fastpath temporarily drops (and reacquires) the port lock when TCP state needs to be invalidated, we opened ourselves up to the possibility that another packet could have removed this state ahead of us. Equally, a packet could insert new TCP state which we might accidentally remove.
This PR removes the unwrap on removal to account for the race, and only removes TCP flows if they are pointer-equal.
Closes #618, closes #624.