-
-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Multiple issues in SentryReachability #3338
Conversation
The tests can launch multiple threads of the ANR tracker, which call dealloc of SentryReachability simultaneously, leading to a segfault while iterating over the observers in SentryReachability. This is fixed now by using some synchronization around the observer's access. Furthermore, the bookkeeping of the observers is implemented now with weak references using an NSHashTable. Instead of relying on dealloc to be called for removing the observers, each observer must now call removeObserver after subscribing. If an observer doesn't call removeObserver no memory is leaked cause the SentryReachability uses weak references. Previously, the list of observers used a dictionary with the description method of Objective-C as the key, which is suboptimal cause the description could change between adding and removing the observer, leading to memory leaks when the observer's description changed between invocations.
Performance metrics 🚀
|
Revision | Plain | With Sentry | Diff |
---|---|---|---|
1bbcb9c | 1192.51 ms | 1231.96 ms | 39.45 ms |
160a073 | 1260.72 ms | 1270.10 ms | 9.38 ms |
3b782cc | 1261.67 ms | 1272.24 ms | 10.57 ms |
7bc3c0d | 1259.74 ms | 1268.45 ms | 8.71 ms |
f587451 | 1271.63 ms | 1275.90 ms | 4.27 ms |
df2986b | 1215.20 ms | 1235.02 ms | 19.82 ms |
32e64d1 | 1224.45 ms | 1238.94 ms | 14.49 ms |
2d5b1bd | 1244.04 ms | 1255.18 ms | 11.14 ms |
c78683b | 1255.78 ms | 1261.94 ms | 6.16 ms |
46c6025 | 1213.87 ms | 1253.06 ms | 39.19 ms |
App size
Revision | Plain | With Sentry | Diff |
---|---|---|---|
1bbcb9c | 20.76 KiB | 426.10 KiB | 405.34 KiB |
160a073 | 22.85 KiB | 408.88 KiB | 386.03 KiB |
3b782cc | 20.76 KiB | 432.21 KiB | 411.45 KiB |
7bc3c0d | 20.76 KiB | 427.35 KiB | 406.59 KiB |
f587451 | 20.76 KiB | 435.25 KiB | 414.49 KiB |
df2986b | 22.85 KiB | 406.89 KiB | 384.04 KiB |
32e64d1 | 20.76 KiB | 433.18 KiB | 412.42 KiB |
2d5b1bd | 22.84 KiB | 403.52 KiB | 380.67 KiB |
c78683b | 20.76 KiB | 435.24 KiB | 414.47 KiB |
46c6025 | 22.85 KiB | 408.84 KiB | 385.99 KiB |
Previous results on branch: fix/reachability-crash
Startup times
Revision | Plain | With Sentry | Diff |
---|---|---|---|
8e131db | 1232.88 ms | 1252.44 ms | 19.56 ms |
bccf130 | 1197.15 ms | 1223.64 ms | 26.49 ms |
18ba1ad | 1228.34 ms | 1251.46 ms | 23.12 ms |
5b1adbc | 1237.02 ms | 1243.84 ms | 6.82 ms |
9eada8d | 1221.07 ms | 1248.74 ms | 27.67 ms |
App size
Revision | Plain | With Sentry | Diff |
---|---|---|---|
8e131db | 22.85 KiB | 408.82 KiB | 385.97 KiB |
bccf130 | 22.85 KiB | 409.45 KiB | 386.60 KiB |
18ba1ad | 22.85 KiB | 408.82 KiB | 385.97 KiB |
5b1adbc | 22.85 KiB | 410.39 KiB | 387.54 KiB |
9eada8d | 22.85 KiB | 410.31 KiB | 387.46 KiB |
Codecov Report
@@ Coverage Diff @@
## main #3338 +/- ##
=============================================
+ Coverage 89.205% 89.310% +0.104%
=============================================
Files 501 502 +1
Lines 54085 54410 +325
Branches 19203 19544 +341
=============================================
+ Hits 48247 48594 +347
+ Misses 4860 4845 -15
+ Partials 978 971 -7
... and 70 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the description could change between adding and removing the observer
Ouch. Didn't know this when I made some changes around here in #3232. How does the description change, a new pointer address? Or is it something we're overriding? I looked for such overrides but we don't provide a custom description for SentryBreadcrumbTracker or SentryHttpTransport. Great catch in any case.
I looked at some of the failing tests, and it looks like testIntegration_UrlSessionDelegate_PassedToRequestManager
is failing on CI for iOS/tvOS because it's trying to make a real network request, and I'm guessing some kind of configuration issue is preventing it from succeeding? Or maybe an intermittent issue? Not sure why it'd only start failing now. But I think it might make more sense anyways for that to be pointed at the localhost test server vs actual the real sentry.io host over the internet:
2023-10-09 13:25:29.398425+0000 xctest[7127:38848] Task <F2188CA5-B9D5-427E-BF75-CB882EED47F7>.<1> finished with error [-1001] Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=-2102, NSUnderlyingError=0x600004b11800 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <F2188CA5-B9D5-427E-BF75-CB882EED47F7>.<1>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalDataTask <F2188CA5-B9D5-427E-BF75-CB882EED47F7>.<1>"
), NSLocalizedDescription=The request timed out., NSErrorFailingURLStringKey=https://app.getsentry.com/api/12345/envelope/, NSErrorFailingURLKey=https://app.getsentry.com/api/12345/envelope/, _kCFStreamErrorDomainKey=4}
2023-10-09 13:25:29.398693+0000 xctest[7127:38848] [Sentry] [debug] [SentryRequestOperation:34] Request status: 0
2023-10-09 13:25:29.398935+0000 xctest[7127:38848] [Sentry] [error] [SentryRequestOperation:42] Request failed: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=-2102, NSUnderlyingError=0x600004b11800 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <F2188CA5-B9D5-427E-BF75-CB882EED47F7>.<1>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalDataTask <F2188CA5-B9D5-427E-BF75-CB882EED47F7>.<1>"
), NSLocalizedDescription=The request timed out., NSErrorFailingURLStringKey=https://app.getsentry.com/api/12345/envelope/, NSErrorFailingURLKey=https://app.getsentry.com/api/12345/envelope/, _kCFStreamErrorDomainKey=4}
2023-10-09 13:25:29.399393+0000 xctest[7127:38848] [Sentry] [debug] [SentryQueueableRequestManager:42] Queued requests: 0
2023-10-09 13:25:29.399562+0000 xctest[7127:38848] [Sentry] [debug] [SentryHttpTransport:340] WeakSelf is nil. Not doing anything.
I'm not sure yet why testMultipleReachabilityObservers
is failing on macOS, nothing obvious is being logged yet, and it passes for me locally, I added some debug logs in 7cd6146.
category:@"device.connectivity"]; | ||
crumb.type = @"connectivity"; | ||
crumb.data = [NSDictionary dictionaryWithObject:typeDescription forKey:@"connectivity"]; | ||
[self.delegate addBreadcrumb:crumb]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd made this self.delegate
a weak self instead, would we have broken a block capture causing issues here? Just trying to understand why this works if you break use the delegate callback pattern instead of ObjC block callback.
ETA: I believe NSHashTable uses -[NSObject hash]
as the key for the objects; couldn't we use the NSDictionary with the hashes as keys instead of the problematic descriptions? Moot point I guess since you've done the work, but if that would work we might consider doing that to minimize git history churn. We could swap out NSDictionary for NSMapTable to get the weak object semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self.delegate
is already weak reference see
@property (nonatomic, weak) id<SentryBreadcrumbDelegate> delegate; |
I personally prefer the delegate pattern, cause you don't need to worry about using weak references in all the block if you set it up properly. Otherwise, you have to ensure using weak references in the blocks which is easier to get wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't we use the NSDictionary with the hashes as keys instead of the problematic descriptions
Yes, if we use NSMapTable
as pointed out by you, but I would prefer the NSHashTable
, cause we don't have to worry about how to come up with the keys and write .allValues
two times. I can change it if you prefer. No strong opinion on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I'm fine with all this!
if (sentry_current_reachability_state != kSCNetworkReachabilityFlagsUninitialized) { | ||
return; | ||
} | ||
if (sentry_reachability_observers.count > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we remove all observers, and then add one again? It will reenter the logic below, is that desired? I still think this would be better with a dispatch_once
to avoid that situation and also it may just be more optimized than calling objc_msgSend for count and comparing the integer value.
ETA: I see how it's cleaned up after reading below. I would say at a minimum leave a comment explaining here, but ideally we'd have some more semantic code wrapping init/teardown. You already have the teardown wrapped, but the init is a bit unclear here. In general though I like your solution best because we release the memory for the unused queue and stop callbacks from SCNetworkReachabilitySetCallback being sent when there are no observers. Nice work 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dispatch once caused some problems in the tests. Yes, I want to properly teardown everything. I will include your feedback. Thanks 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah that is another good point 👍🏻 guess we need a SentryDispatchOnce wrapper we can subclass for tests 😆
I think we don't, but such a change could happen any time in the future. |
Co-authored-by: Andrew McKnight <[email protected]>
Thanks for looking into it @armcknight, but the problem was that my test here sentry-cocoa/Tests/SentryTests/Networking/SentryReachabilitySwiftTests.swift Lines 5 to 13 in 51d8267
created hundreds of threads cause when adding the observer I didn't check if it's already in the list and always created a new dispatch queue, which I do know: sentry-cocoa/Sources/Sentry/SentryReachability.m Lines 152 to 158 in 51d8267
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for working it out @philipphofmann 🥔 I added/tweaked a few logs and will merge it once CI runs again.
📜 Description
The tests can launch multiple threads of the ANR tracker, which call dealloc of SentryReachability simultaneously, leading to a segfault while iterating over the observers in SentryReachability. This is fixed now by using some synchronization around the observer's access. Furthermore, the bookkeeping of the observers is implemented now with weak references using an NSHashTable. Instead of relying on dealloc to be called for removing the observers, each observer must now call removeObserver after subscribing. If an observer doesn't call removeObserver no memory is leaked cause the SentryReachability uses weak references. Previously, the list of observers used a dictionary with the description method of Objective-C as the key, which is suboptimal cause the description could change between adding and removing the observer, leading to memory leaks when the observer's description changed between invocations.
💡 Motivation and Context
A test in CI contained the following crash report.
Crash Report
Some links to crash reports with the same root cause:
💚 How did you test it?
Unit tests and on a real device to ensure the logic of reachability is still working.
📝 Checklist
You have to check all boxes before merging:
sendDefaultPII
is enabled.🔮 Next steps