unable to call debug_traceBlockByHash on large blocks #984

saurik · 2021-12-31T12:47:21Z

Blocks have been getting larger, and thereby so has the result of debug_traceBlockByHash. I'm trying to trace block #8942019, hash 0x112b4b6bf1097fa43a8f7406310184040a3a63716091fa3c3322d434617840d6, and I just don't get any result at all. I've managed to determine that it has no issue generating the answer, but writeJSONSkipDeadline is returning an error that is being eaten by handleMsg. I modified coreth to output the error and was able to determine it was weirdly due to a grpc client in avalanchego that refused to receive all of the data.

rpc error: code = ResourceExhausted desc = grpc: received message larger than max (60044844 vs. 4194304)

I've determined the following patch--which notably affects the second block of repeated, similar code: the gresponsewriter--allows me to get the full response. Note that math.MaxInt32 is what grpc uses internally as the default for some other similar limits and is what the go-plugin project also uses to disable all such limits in its client, so I believe it is more correct than merely setting a larger arbitrary limit, assuming I am understanding correctly that this is for responses from APIs (and a user can't connect and send an infinite amount of data to this server).

diff --git a/vms/rpcchainvm/ghttp/http_client.go b/vms/rpcchainvm/ghttp/http_client.go
index fb53d74b4..14a944ff3 100644
--- a/vms/rpcchainvm/ghttp/http_client.go
+++ b/vms/rpcchainvm/ghttp/http_client.go
@@ -4,6 +4,7 @@
 package ghttp

 import (
+       "math"
        "net/http"

        "google.golang.org/grpc"
@@ -49,6 +50,7 @@ func (c *Client) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        })
        writerID := c.broker.NextId()
        go c.broker.AcceptAndServe(writerID, func(opts []grpc.ServerOption) *grpc.Server {
+               opts = append(opts, grpc.MaxRecvMsgSize(math.MaxInt32))
                writer := grpc.NewServer(opts...)
                closer.Add(writer)
                gresponsewriterproto.RegisterWriterServer(writer, gresponsewriter.NewServer(w, c.broker))

saurik · 2022-01-01T08:25:48Z

LOL. Block 0x145064ae22b9e189e4adbc9a536f8f42a833540ca7881b5cbeeb43ee687bbce2 (for example) has a single transaction, but if you enable memory tracing it requires more than math.MaxInt32 to store the JSON.

rpc error: code = ResourceExhausted desc = trying to send message larger than max (3146406314 vs. 2147483647)

I noticed that grpc internally uses int(^uint(0) >> 1) in some places instead of math.MaxInt32 and started going through playing whack-a-mole with various limits fixing them to let me get my API result until I finally ran into one I couldn't fix without modifying the go-plugin library: in vms/rpcchainvm/vm_client.go there are calls to vm.broker.Dial; this API in turn calls dialGRPCConn, which lets you set overrides to the grpc options, but is missing an argument to allow you to pass them through (which is weird as they have that in most other places).

FWIW, as I think the mechanism you have for passing the response back to the HTTP server actually supports multiple calls to Write, the core issue with writing this JSON--ignoring the higher-level issues with this API that I feel fine ignoring as I have hundreds of gigabytes of RAM--is that Go's JSON encoder is awkwardly layered and not only doesn't support SAX-style streaming but internally buffers its result before writing it in a single shot to the io.Writer... the API frankly may as well just return a Buffer at that point to avoid assuming reasonable design.

So it could be that just having some kind of re-buffered io.Writer--which takes incoming calls to Write for large sizes and breaks them up into smaller writes--would be sufficient to fix this. (It just feels really weird as what I'd expect is for the JSON encoder to do tiny writes and then instead of having its own buffer you could add a layer similar to a java.io.StringOutputStream if you wanted to get the buffer, or something similar to a java.io.BufferedOutputStream if you wanted to aggregate the writes into larger chunks, which would make this stack more intuitive).

Another option maybe worth considering is having these channels use the GRPC compression mechanism I kept coming across while trying to fix all of these limits: these JSON results compress extremely well (which is why I'm bothering getting these files). Even using zstd -3 this 3G file compresses down to 7M (though gzip apparently sucks--probably due to a window size difference--despite using a lot more CPU... 690M? I swear I've even checked round-tripping this through zstd and it really is just 7M; and zstd -19--which takes a long time, though--gets it all the way down to 5M ;P).

StephenButtolph · 2022-01-03T22:16:35Z

I made a PR to implement your suggestion for GRPCBroker.Dial: hashicorp/go-plugin#185. Seems like the last PR was merged in August... So not sure how high my hopes are that it's going to be merged in quickly.

As for setting the remaining values - you're correct. There's no reason to limit the message sizes. I'll make a PR to address that.

StephenButtolph · 2022-04-19T00:02:43Z

We have removed the usage of the broker in v1.7.10 - which means the above PR to hashicorp should no longer be needed. I'm going to close this - as I think this should be fully resolved now... Feel free to reopen it if I'm missing something.

saurik added the bug Something isn't working label Dec 31, 2021

This was referenced Jan 2, 2022

re-support debug_standardTraceBlockToFile ava-labs/coreth#177

Closed

dialGRPCConn should default recv/send msg size to max int not max int32 hashicorp/go-plugin#184

Open

StephenButtolph closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to call debug_traceBlockByHash on large blocks #984

unable to call debug_traceBlockByHash on large blocks #984

saurik commented Dec 31, 2021

saurik commented Jan 1, 2022 •

edited

Loading

StephenButtolph commented Jan 3, 2022

StephenButtolph commented Apr 19, 2022

unable to call debug_traceBlockByHash on large blocks #984

unable to call debug_traceBlockByHash on large blocks #984

Comments

saurik commented Dec 31, 2021

saurik commented Jan 1, 2022 • edited Loading

StephenButtolph commented Jan 3, 2022

StephenButtolph commented Apr 19, 2022

saurik commented Jan 1, 2022 •

edited

Loading