Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race conditions #45

Closed
daladim opened this issue Oct 10, 2022 · 3 comments
Closed

Possible race conditions #45

daladim opened this issue Oct 10, 2022 · 3 comments

Comments

@daladim
Copy link
Collaborator

daladim commented Oct 10, 2022

Race 1

There is no specific synchronization mechanism in trace_callback_thunk().
Which means a thread could be closing the trace session and destroying the TraceData, just at the same time the last callback was triggered. And the event_record.user_context dereferenced in another thread, supposed to point to a TraceData could be dangling here

Race 2

If a thread closes the trace session and destroys the TraceData, the associated Provider, hence the associated callbacks, which are closures, are dropped. That's a problem in case a callback is still in progress (e.g. a callback is stuck in a blocking function call), because the closure may contain state (e.g. in a move || closure).

Notes

krabsetw may have the same issues. See the issue in krabsetw

@daladim
Copy link
Collaborator Author

daladim commented Oct 10, 2022

I'm on my way to fix it.

But this (more or less) requires some changes to make the unsafes less unsafe. E.g.

  • have a Builder pattern, and less pub members, to guarantee more data stays constant during the lifetime of the trace session. Because having non-mutable data is a requirement when accessing this data from a raw pointer (e.g. the one given by the ETW callback)
  • To make the review easier, I may simplify some types (e.g. remove the distinction between TraceInfo and EventTraceProperties, or flatten some data from TraceInfo into TraceData, remove the fill_info() functions, etc.
  • etc.

daladim added a commit to daladim/ferrisetw that referenced this issue Oct 20, 2022
Wrapping it into an Arc ensures we're not dropping it when the trace is stopped, but we're waiting for the potential callbacks to terminate first

This fixes "Race 2" in n4r1b#45 (n4r1b#45)
daladim added a commit to JustRustThings/ferrisetw that referenced this issue Oct 21, 2022
Wrapping it into an Arc ensures we're not dropping it when the trace is stopped, but we're waiting for the potential callbacks to terminate first

This fixes "Race 2" in n4r1b#45 (n4r1b#45)
daladim added a commit to JustRustThings/ferrisetw that referenced this issue Oct 21, 2022
@daladim
Copy link
Collaborator Author

daladim commented Nov 3, 2022

Note that race 2 is rather likely. If CloseTrace returns an ERROR_CTX_CLOSE_PENDING, this means the callback will be invoked again, after we may be tempted to drop the closure.

daladim added a commit to daladim/ferrisetw that referenced this issue Nov 3, 2022
This largely refactors trace.rs and evntrace.rs
* evntrace now solely contains safe wrappers over Windows API functions, without any internal state (struct NativeEtw has been removed)
  that's the duty of trace.rs to handle these API correctly and in the right order.
* Traces instances are now created with a Builder pattern, to clearly mark which fields may be mutable and which will stay constant.
  This will make it very easy to fix races in issue n4r1b#45

Also, as minor improvements:
* the builder now enforces the trace name is truncated to TRACE_NAME_MAX_CHARS, so that both EVENT_TRACE_LOGFILEW and EVENT_TRACE_PROPERTIES have consistently truncated logger names
* TraceData is renamed CallbackData.
  That's mainly a matter of taste, even though it makes its intent clearer, and thus makes it easier to review the `unsafe` blocks
* errors from evntrace are now better forwarded to callers
* checks for invalid handles from the Windows API has been made more explicit
* the public API for traces (and trace builder) is now simplified, and hides some of the "really weird" (to say the least) design choices of ETW.
  Distinction between open/start/process is now clearer to the user
  Also, the `process` now exists in different flavours, that do not all hide the thread spawning. This offers more control to the end user.
* Traces can explictly be closed, and are closed if still open on Drop (maybe that was the case in ferrisetw 0.1, I'm not sure)
* This removes the distinction between TraceTrait and TraceBaseTrait

Sorry, I did not manager to split this large commit into smaller chunks.
It's probably easier to read only the result of it rather than the diffs, which do not make much sense since most of evntrace.rs and trace.rs are now diffs.
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 3, 2022
Wrapping it into an Arc ensures we're not dropping it when the trace is stopped, but we're waiting for the potential callbacks to terminate first

This fixes "Race 2" in n4r1b#45 (n4r1b#45)
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 3, 2022
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 7, 2022
This largely refactors trace.rs and evntrace.rs
* evntrace now solely contains safe wrappers over Windows API functions, without any internal state (struct NativeEtw has been removed)
  that's the duty of trace.rs to handle these API correctly and in the right order.
* Traces instances are now created with a Builder pattern, to clearly mark which fields may be mutable and which will stay constant.
  This will make it very easy to fix races in issue n4r1b#45

Also, as minor improvements:
* the builder now enforces the trace name is truncated to TRACE_NAME_MAX_CHARS, so that both EVENT_TRACE_LOGFILEW and EVENT_TRACE_PROPERTIES have consistently truncated logger names
* TraceData is renamed CallbackData.
  That's mainly a matter of taste, even though it makes its intent clearer, and thus makes it easier to review the `unsafe` blocks
* errors from evntrace are now better forwarded to callers
* checks for invalid handles from the Windows API has been made more explicit
* the public API for traces (and trace builder) is now simplified, and hides some of the "really weird" (to say the least) design choices of ETW.
  Distinction between open/start/process is now clearer to the user
  Also, the `process` now exists in different flavours, that do not all hide the thread spawning. This offers more control to the end user.
* Traces can explictly be closed, and are closed if still open on Drop (maybe that was the case in ferrisetw 0.1, I'm not sure)
* This removes the distinction between TraceTrait and TraceBaseTrait

Sorry, I did not manager to split this large commit into smaller chunks.
It's probably easier to read only the result of it rather than the diffs, which do not make much sense since most of evntrace.rs and trace.rs are now diffs.
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 7, 2022
Wrapping it into an Arc ensures we're not dropping it when the trace is stopped, but we're waiting for the potential callbacks to terminate first

This fixes "Race 2" in n4r1b#45 (n4r1b#45)
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 7, 2022
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 7, 2022
This largely refactors trace.rs and evntrace.rs
* evntrace now solely contains safe wrappers over Windows API functions, without any internal state (struct NativeEtw has been removed)
  that's the duty of trace.rs to handle these API correctly and in the right order.
* Traces instances are now created with a Builder pattern, to clearly mark which fields may be mutable and which will stay constant.
  This will make it very easy to fix races in issue n4r1b#45

Also, as minor improvements:
* the builder now enforces the trace name is truncated to TRACE_NAME_MAX_CHARS, so that both EVENT_TRACE_LOGFILEW and EVENT_TRACE_PROPERTIES have consistently truncated logger names
* TraceData is renamed CallbackData.
  That's mainly a matter of taste, even though it makes its intent clearer, and thus makes it easier to review the `unsafe` blocks
* errors from evntrace are now better forwarded to callers
* checks for invalid handles from the Windows API has been made more explicit
* the public API for traces (and trace builder) is now simplified, and hides some of the "really weird" (to say the least) design choices of ETW.
  Distinction between open/start/process is now clearer to the user
  Also, the `process` now exists in different flavours, that do not all hide the thread spawning. This offers more control to the end user.
* Traces can explictly be closed, and are closed if still open on Drop (maybe that was the case in ferrisetw 0.1, I'm not sure)
* This removes the distinction between TraceTrait and TraceBaseTrait

Sorry, I did not manage to split this large commit into smaller chunks.
It's probably easier to read only the result of it rather than the diffs, which do not make much sense since most of evntrace.rs and trace.rs are now diffs.
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 7, 2022
Wrapping it into an Arc ensures we're not dropping it when the trace is stopped, but we're waiting for the potential callbacks to terminate first

This fixes "Race 2" in n4r1b#45 (n4r1b#45)
daladim added a commit to daladim/ferrisetw that referenced this issue Nov 7, 2022
@daladim
Copy link
Collaborator Author

daladim commented Jan 13, 2023

Both races have been fixed, and fixes are included in ferrisetw 1.0

@daladim daladim closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant