Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal SocketException on InputStream.read - "Software caused connection abort" #2861

Open
1 task done
JornR94 opened this issue Jun 28, 2024 · 4 comments
Open
1 task done
Labels
bug Something isn't working datastore DataStore category/plugins pending-community-response Issue is pending response from the issue requestor

Comments

@JornR94
Copy link

JornR94 commented Jun 28, 2024

Before opening, please confirm:

Language and Async Model

Java

Amplify Categories

GraphQL API, DataStore

Gradle script dependencies

implementation 'com.amplifyframework:aws-api:2.16.1'
implementation 'com.amplifyframework:aws-datastore:2.16.1'

Environment information

------------------------------------------------------------
Gradle 8.0
------------------------------------------------------------

Build time:   2023-02-13 13:15:21 UTC
Revision:     62ab9b7c7f884426cf79fbedcf07658b2dbe9e97

Kotlin:       1.8.10
Groovy:       3.0.13
Ant:          Apache Ant(TM) version 1.10.11 compiled on July 10 2021
JVM:          17.0.6 (JetBrains s.r.o. 17.0.6+0-b2043.56-10027231)
OS:           Windows 10 10.0 amd64

Please include any relevant guides or documentation you're referencing

No response

Describe the bug

In our production app, I'm seeing a crash happen occasionally with the Amplify AWS SDK for Android. The crash is a Fatal SocketException. Full stack trace below:

Fatal Exception: ag.g
The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | DataStoreException{message=Failure performing sync query to AppSync., cause=ApiException{message=Could not retrieve the response body from the returned JSON, cause=java.net.SocketException: Software caused connection abort, recoverySuggestion=Sorry, we don’t have a recovery suggestion for this error.}, recoverySuggestion=Sorry, we don’t have a recovery suggestion for this error.}

io.reactivex.rxjava3.plugins.RxJavaPlugins.onError (RxJavaPlugins.java:367)
io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.onError (SingleCreate.java:82)
com.amplifyframework.datastore.appsync.AppSyncClient.lambda$sync$0 (AppSyncClient.java:115)
com.amplifyframework.api.aws.AppSyncGraphQLOperation$OkHttpCallback.onResponse (AppSyncGraphQLOperation.java:138)
okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:539)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644)
java.lang.Thread.run (Thread.java:1012)

Caused by com.amplifyframework.datastore.DataStoreException
Failure performing sync query to AppSync.

com.amplifyframework.datastore.appsync.AppSyncClient.lambda$sync$0 (AppSyncClient.java:115)
com.amplifyframework.api.aws.AppSyncGraphQLOperation$OkHttpCallback.onResponse (AppSyncGraphQLOperation.java:138)
okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:539)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644)
java.lang.Thread.run (Thread.java:1012)

Caused by com.amplifyframework.api.ApiException
Could not retrieve the response body from the returned JSON

com.amplifyframework.api.aws.AppSyncGraphQLOperation$OkHttpCallback.onResponse (AppSyncGraphQLOperation.java:138)
okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:539)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644)
java.lang.Thread.run (Thread.java:1012)

Caused by java.net.SocketException
Software caused connection abort

java.net.SocketInputStream.socketRead0 (SocketInputStream.java)
java.net.SocketInputStream.socketRead (SocketInputStream.java:118)
java.net.SocketInputStream.read (SocketInputStream.java:173)
java.net.SocketInputStream.read (SocketInputStream.java:143)
com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket (ConscryptEngineSocket.java:983)
com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket (ConscryptEngineSocket.java:947)
com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.readUntilDataAvailable (ConscryptEngineSocket.java:862)
com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.read (ConscryptEngineSocket.java:835)
okio.InputStreamSource.read (InputStreamSource.java:93)
okio.AsyncTimeout$source$1.read (AsyncTimeout.kt:128)
okio.RealBufferedSource.request (RealBufferedSource.kt:209)
okio.RealBufferedSource.require (RealBufferedSource.kt:202)
okhttp3.internal.http2.Http2Reader.nextFrame (Http2Reader.kt:90)
okhttp3.internal.http2.Http2Connection$ReaderRunnable.invoke (Http2Connection.kt:618)
okhttp3.internal.http2.Http2Connection$ReaderRunnable.invoke (Http2Connection.kt:609)
okhttp3.internal.concurrent.TaskQueue$execute$1.runOnce (TaskQueue.kt:102)
okhttp3.internal.concurrent.TaskRunner.runTask (TaskRunner.kt:117)
okhttp3.internal.concurrent.TaskRunner.access$runTask (TaskRunner.kt:42)
okhttp3.internal.concurrent.TaskRunner$runnable$1.run (TaskRunner.kt:66)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644)
java.lang.Thread.run (Thread.java:1012)

It looks to me like there might be a try/catch block missing somewhere in the flow of this error (in the DataStore com.amplifyframework.datastore or the GraphQL API in com.amplifyframework.api). I flagged a similar issue in this issue that seems related just now.

This exception occurred for about 2% of users of our app, which is having a significant impact on the crash rate of our app. I would love your help -- please let me know if I can provide further details to help with solving this.

Just like with the other issue: not sure it's helpful but interestingly, 63% of the exceptions happen on Samsung phones, which is far from the distribution of device-type for our userbase. So there seems to be some relation between the crashes occurring more often on Samsung OS (although it does also happen on other not-customized OEMs like Google Pixels, in this case 8% of crashes on Google phones).

Reproduction steps (if applicable)

No response

Code Snippet

// I'm pretty sure it's happening in the AWS Amplify SDK 

Log output

// Put your logs below this line


amplifyconfiguration.json

No response

GraphQL Schema

// Put your schema below this line

Additional information and screenshots

No response

@github-actions github-actions bot added the pending-triage Issue is pending triage label Jun 28, 2024
@mattcreaser mattcreaser added datastore DataStore category/plugins question General question labels Jun 28, 2024
@github-actions github-actions bot removed the pending-triage Issue is pending triage label Jun 28, 2024
@mattcreaser mattcreaser added bug Something isn't working and removed question General question labels Jun 28, 2024
@mattcreaser
Copy link
Member

mattcreaser commented Jun 28, 2024

Thanks for the report @JornR94. As per the linked RxJava documentation the seeming cause here is that a socket exception (which often just means the network dropped) occurred after the emitter for the exception was already disposed.

While it's possible a bug could be fixed here on Amplify's side, this can also be worked around on the application side by ignoring such errors. The RxJava documentation has a good example, the relevant part is the ignoring of SocketException.

RxJavaPlugins.setErrorHandler(e -> {
    if (e instanceof UndeliverableException) {
        e = e.getCause();
    }
    if ((e instanceof IOException) || (e instanceof SocketException)) {
        // fine, irrelevant network problem or API that throws on cancellation
        return;
    }
    if (e instanceof InterruptedException) {
        // fine, some blocking code was interrupted by a dispose call
        return;
    }
    if ((e instanceof NullPointerException) || (e instanceof IllegalArgumentException)) {
        // that's likely a bug in the application
        Thread.currentThread().getUncaughtExceptionHandler()
            .handleException(Thread.currentThread(), e);
        return;
    }
    if (e instanceof IllegalStateException) {
        // that's a bug in RxJava or in a custom operator
        Thread.currentThread().getUncaughtExceptionHandler()
            .handleException(Thread.currentThread(), e);
        return;
    }
    Log.warning("Undeliverable exception received, not sure what to do", e);
});

@JornR94
Copy link
Author

JornR94 commented Jun 28, 2024

Hi @mattcreaser, thanks for the quick reply! That makes a lot of sense, let me implement that myself to prevent this exception from crashing my app then.

As a side note, I did file this other issue that's pretty similar, but it seems like that's throwing off a StreamResetException which extends IOException, so that should also be covered by adding this error handler for RxJava 👍

Are there any plans for integrating this into the AWS Amplify SDK?
Thanks Matt!

@mattcreaser
Copy link
Member

We'll need to do a little more investigation to see if we can catch these errors internally so that they don't propagate out, I took a quick look but it wasn't immediately obvious where to do so.

We won't be adding an RxJavaPlugins.setErrorHandler however, as that is incorrect if done by a library, it is only appropriate to use for the end application code.

@JornR94
Copy link
Author

JornR94 commented Jun 28, 2024

Makes sense! I can't recall 100%, but I don't think I saw this mentioned anywhere in the implementation docs for Amplify on Android--I think this would be very valuable information to add to the GraphQL/DataStore implementation docs, to prevent unexpected crashes like these after adding the Amplify SDK

@tylerjroach tylerjroach added the pending-community-response Issue is pending response from the issue requestor label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datastore DataStore category/plugins pending-community-response Issue is pending response from the issue requestor
Projects
None yet
Development

No branches or pull requests

3 participants