Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add invocations to applicationlog #3569

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ixje
Copy link
Contributor

@ixje ixje commented Sep 3, 2024

Problem

neo-project/neo#3386

Solution

Implement as extension. Moved the discussion from Dora's backend PR to here

To do

  • copy arguments to avoid modifications
  • limit the total number of argument stack items in a single transaction (for safety)
  • make this a configurable feature
  • include native contract calls

Do we want to limit the stack item depth (think MaxJSONDepth) or are we content with just limiting the total stack arguments?

Copy link
Member

@AnnaShaleva AnnaShaleva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good prototype, but I have several design questions that we should solve before review.

pkg/core/interop/contract/call.go Outdated Show resolved Hide resolved
Comment on lines 73 to 76
ic.InvocationCalls = append(ic.InvocationCalls, state.ContractInvocation{
Hash: u,
Method: method,
Params: stackitem.NewArray(args),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding size restrictions: let's summarize what have been discussed in https://github.com/CityOfZion/dora-server-NeoN3/pull/220#issuecomment-2325214725. Restriction on the number of recorded invocations or the number of arguments per each recorded invocation (or per all recorded invocations) is needed. I'd suggest to restrict both:

  • The overall number of invocations is restricted in depth by the invocation stack size, but it's not restricted in length (only by executing GAS cost restriction). This problem should be solved the same way as restriction of the number of Notifications (Investigate System.Runtime.Notify refcounting #3490). Currently in NeoGo node it's not restricted, but it may be restricted in C# node (which means we have a bug in NeoGo) or it may not be restricted at all (which means we have a cross-implementation bug). So we firstly need to solve Investigate System.Runtime.Notify refcounting #3490 and after that port this solution to invocations restriction.
  • The number of arguments per every recordable invocation may be restricted using the same approach as for Notifications: serialize it and check the resulting size:
    bytes, err := ic.DAO.GetItemCtx().Serialize(elem.Item(), false)

    If serialization fails (due to recursive structures presence) or arguments size exceeds the limit, then this invocation wan't be stored in the database.

These size restrictions should be well-documented and it should be noted that invocation tracking system may be missing some invocations and users can't rely on the recorded invocations completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. While I agree it should be restricted, I don't think this point should be blocking (just like it is not for notifications).
  2. This seems to count the total amount of bytes of all stack items. I don't understand how that translates to parameter count of the method called. I can see it used to ensure it doesn't exceed a maximum total item count in the Array though.

I don't think we should throw away the complete invocation record because of a parameter violation. If it's recorded on chain then it was apparently still a valid invocation, regardless if we want to store all of the invocation details. Let's say it exceeds one of the limits, we could still store the contracthash, method, maybe parameter count and skip the actual parameters. Perhaps also adding a is_valid field to the json output that can be used as indicator how much of the information can be trusted.

i.e.
Valid

"invocations": [
    {
        "contract_hash": "0x49cf4e5378ffcd4dec034fd98a174c5491e395e2",
        "method": "designateAsRole",
        "is_valid": true
        "parameters": {
            "type": "Array",
            "value": [
                {
                    "type": "Integer",
                    "value": "4"
                },
            ]
        }
    }
]

Invalid

"invocations": [
    {
        "contract_hash": "0x49cf4e5378ffcd4dec034fd98a174c5491e395e2",
        "method": "designateAsRole",
        "is_valid": false
        "parameters": null
    }
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this point should be blocking

Agree, let's then leave this restriction to #3490 and finalize Invocation logs without it.

This seems to count the total amount of bytes of all stack items. I don't understand how that translates to parameter count of the method called.

I don't think that we need to stick to the number of parameters restriction, because the only thing that bothers us during Invocations collection is the resulting size of the serialized Invocations structure. We don't want it to be large and to take a lot of disc space. Serialize lets us ensure that serialized parameters do not occupy a lot of space and it also cares about the overall number of serialized elements (SerializationContext.limit is responsible for that). Thus I consider Serialize to be a perfect candidate for the arguments size restriction.

Also, we may cache the result of Serialize once, and then reuse it while storing Invocations on disc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Roman was also concerned about the processing time, but I'll let him comment to see if he agrees using serialize is sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processing time is important, but we need to serialize arguments anyway to store Invocation logs, and if we reuse the result of Serialize, then processing time doesn't increase a lot.

And of course, the node should have a setting to disable Invocation logs, because it's a resource-demanding feature. Most of the public nodes likely won't have this feature enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course, the node should have a setting to disable Invocation logs, because it's a resource-demanding feature. Most of the public nodes likely won't have this feature enabled.

Yeah that's fine. I have making it configurable on the to do list. In what section should I put the option? ApplicationConfiguration.RPC or ApplicationConfiguration (because it also affects the indexing behaviour)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's more an ApplicationConfiguration-level setting, in particular, I'd place it into config.Ledger structure. You're right in that it affects the database behaviour, and it's possible that some other node services should be able to reach this setting in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplicationConfiguration is OK, ideally we should make DBs compatible with/without this option.

Side note: serialization is also kinda a snapshot of items, so DeepCopy can even be avoided if we're checking the size via serialization (for notifications copying is important since this data can be reused in the same context for System.Runtime.GetNotifications).

Comment on lines 73 to 76
ic.InvocationCalls = append(ic.InvocationCalls, state.ContractInvocation{
Hash: u,
Method: method,
Params: stackitem.NewArray(args),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "flattened" way of invocations tracking is missing depth, so that given [ContractACall, ContractBCall, ContractCCall] it's impossible to say whether contract B calls contract C internally or contract A calls both B and C subsequently. If comparing with VM-level InvocationsTree, then InvocationsTree gives a clear understanding of calls depth and nesting relationship, which is good for the user:

neo-go/pkg/vm/vm.go

Lines 395 to 397 in d47fe39

newTree := &invocations.Tree{Current: ctx.ScriptHash()}
curTree.Calls = append(curTree.Calls, newTree)
ctx.sc.invTree = newTree

However, using VM InvocationsTree in the current state is impossible, because it does not track call arguments. And it's a problem to make it track call arguments because it only has access to loading context with contract scripthash, and arguments are loaded by interop handlers. This problem may be solved with some additional VM callback.

So the question is: do we need nested relationship to be present in the resulting invocations log? It's important to solve this design question before the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Dora use-case we don't need this information. Keeping it flat would be similar to notifications where we also can't tell who triggered them (i.e. was it user calling contractA which calls Contract B, or was it user calling ContractA and user calling ContractB using 2 System.Contract.Calls in a tx.script).

However, it does seem like this information can be useful to somebody somewhere down the road and changing it later on is going to be a hassle. What would it look like? An option could be

type ContractInvocation struct {
	Hash        util.Uint160         `json:"contract_hash"`
	Method      string               `json:"method"`
	Arguments   *stackitem.Array     `json:"arguments"`
	IsValid     bool                 `json:"is_valid"`
	Invocations []ContractInvocation `json:"invocations"`
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option could be

Agree, it's in fact the way how VM InvocationTree works.

But regarding 1D (flattened) / 2D (nested) structure of Invocations: I think we need some third opinion on this topic. Personally, I vote for the nested structure because it contains more information which may be useful in some cases, and especially for contract calls debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper call tree would have a higher price. I'm OK with keeping it this way if it doesn't have significant performance penalty.

Comment on lines 73 to 76
ic.InvocationCalls = append(ic.InvocationCalls, state.ContractInvocation{
Hash: u,
Method: method,
Params: stackitem.NewArray(args),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question is: how to handle contract exceptions given this way of Invocations counting? E.g. for Notificaitons we revert the whole set of notifications on context unloading if exception was raised and no catch block is present in this context:

ic.Notifications = ic.Notifications[:baseNtfCount] // Rollback all notification changes made by current context.

Should we do the same thing for Invocations? From one point, these invocations were handled by VM and we can't just throw them out from the Invocations list; VM InvocationsTree records all invocations irrespectively of exceptions. From another point, all side-effects of these invocations (notifications, contract storage changes) were reverted so why should we include invocations to the list if engine state remains as it was before these invocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good (but hard) question. At first sight I'm leaning towards rolling back because the invocations list in the applicationlog is not intended as a debugging tool like the invocations tree but as a means to track successful contract invocations. Specifically for Dora it will not process transactions that did not end in a HALT state. Arguably this skews the smart contract invocation statistics, but that's what it is.

Looking at what the invocations entry means in the applicationlog in general I think it should work like notifications. This immediately reminds me of #3189, how would you like this see this for transactions ending in a FAULT state? Include to match the current notifications behaviour or exclude?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not intended as a debugging tool

My first though was that we're developing a debugging tool :D But if not, then probably it would be better to follow Notifications behaviour and revert invocations tree. To me, one of the reason context's Notifications are reverted in case of uncaught exception is the way how system tracks NEP17/NEP11 token transfers: for HALTed transactions Transfer notification with particular arguments is filtered out from the list of notifications and is stored as a transfer record in the DB. That's why it's important to rollback unsuccessful notifications exactly as contract storage changes. And to me then Invocations are expected to behave exactly like Notifications.

Regarding applications logs, we'll have to fix the current behaviour of NeoGo node for FAULT transactions. The current behaviour is helpful and I'd love to keep it, but it's just not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first though was that we're developing a debugging tool :D

My original motivation comes from the problems described in https://github.com/CityOfZion/dora-server-NeoN3/issues/219 but it definitely has the potential to be used for debugging as well. Perhaps the rollback can be disabled when used with diagnostics. If we choose the format to be nested then I think it becomes a more powerful version of the current invocationtree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roman-khimov, what do you think about reverting invocations tree in case of exceptions, do we need it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically I'd love some "called, but reverted" status for them. But if practically consistent (reverted) result is sufficient then ok.

pkg/core/interop/contract/call.go Outdated Show resolved Hide resolved
@AnnaShaleva
Copy link
Member

@roman-khimov, I think we need some third opinion on these topics.

@roman-khimov
Copy link
Member

How about System.Runtime.LoadScript calls, btw?

@AnnaShaleva
Copy link
Member

How about System.Runtime.LoadScript calls

It leads to new execution context creation, thus it's a valid part of invocation tree. But is this information useful in practice? Dynamic invocations are identified by hash160 of the loaded script, as a result user can't get this script because he knows only its hash. But still we may include dynamic invocations into the resulting Invocations tree with some special field like isContractCall: false.

@ixje
Copy link
Contributor Author

ixje commented Oct 31, 2024

Picking this up again. I rebased the branch onto latest master and processed some of the feedback. In particular

  • use stackitem.Serialize instead of deepcopy and re-use the results when storing the data
  • make the behaviour configurable through a SaveInvocations config option

Note; It was unclear to me based on #3569 (comment) if I should have made it a tree or keep it flat. I kept it flat for now.

If the feature is enabled the applicationlog output looks as follows

"invocations": [
                    {
                        "contract_hash": "0xd2a4cff31913016155e38e474a2c06d08be276cf",
                        "method": "transfer",
                        "arguments": {
                            "type": "Array",
                            "value": [
                                {
                                    "type": "ByteString",
                                    "value": "krOcd6pg8ptXwXPO2Rfxf9Mhpus="
                                },
                                {
                                    "type": "ByteString",
                                    "value": "AZelPVEEY0csq+FRLl/HJ9cW+Qs="
                                },
                                {
                                    "type": "Integer",
                                    "value": "1000000000000"
                                },
                                {
                                    "type": "Any"
                                }
                            ]
                        },
                        "arguments_count": 4,
                        "is_valid": true
                    }
                ]

and in disabled state it returns

"invocations": []

I'm looking for feedback on the above before taking care of covering System.Runtime.LoadScript calls

@ixje ixje requested a review from AnnaShaleva October 31, 2024 09:43
@ixje
Copy link
Contributor Author

ixje commented Nov 14, 2024

@AnnaShaleva can this PR also get some review love please

@ixje ixje marked this pull request as ready for review November 19, 2024 10:26
Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any limits for the overall size of saved data. While the feature is optional we still need to protect node from abuse.

@@ -4,7 +4,6 @@ import (
"encoding/json"
"errors"
"fmt"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't happen.

@@ -120,6 +191,7 @@ func (aer *AppExecResult) DecodeBinary(r *io.BinReader) {
aer.Stack = arr
r.ReadArray(&aer.Events)
aer.FaultException = r.ReadString()
r.ReadArray(&aer.Invocations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes current DB incompatible. We need to have this compatibility.

Method: method,
Arguments: stackitem.NewArray(args),
})
if ic.Chain.GetConfig().Ledger.SaveInvocations {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfig() for every call is too costly, we need some new field in ic inited on NewContext() (just like Network and Hardforks are initialized currently).

Hash util.Uint160 `json:"contract_hash"`
Method string `json:"method"`
Arguments *stackitem.Array `json:"arguments"`
ArgumentsCount uint32 `json:"arguments_count"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this one? In the optimistic case you have Arguments with some proper number of elements. In pessimistic, does it make any difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that in case of Isvalid = false, then at least we have some information on the argument count.

as said in #3569 (comment)

I don't think we should throw away the complete invocation record because of a parameter violation. If it's recorded on chain then it was apparently still a valid invocation, regardless if we want to store all of the invocation details

I don't recall a hard necessity for this in my original use-case for this PR, but I think it's cheap to do and harder to add later (assuming the C# follows).

@@ -24,36 +24,28 @@ type ContractInvocation struct {
Hash util.Uint160 `json:"contract_hash"`
Method string `json:"method"`
Arguments *stackitem.Array `json:"arguments"`
ArgumentsBytes []byte `json:"arguments_bytes"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too many details for a public structure used in popular APIs. This should be hidden.

Arguments *stackitem.Array `json:"arguments"`
ArgumentsBytes []byte `json:"arguments_bytes"`
ArgumentsCount uint32 `json:"arguments_count"`
IsValid bool `json:"is_valid"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice we never use snake_case in JSON structures, this should be something different. Also, from the Go perspective it's very convenient to have the default boolean (false) to follow regular use case. Like we use truncated for various find* results. Maybe truncated is applicable here too.

arr := stackitem.NewArray(args)
arrCount := len(args)
valid := true
argBytes := []byte{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this to be initialized to zero-length slice? Also, stylistically I'd prefer

var (
)

block here since you declare a lot of things.

@@ -69,6 +69,26 @@ func Call(ic *interop.Context) error {
return fmt.Errorf("method not found: %s/%d", method, len(args))
}
hasReturn := md.ReturnType != smartcontract.VoidType

if ic.Chain.GetConfig().Ledger.SaveInvocations {
arr := stackitem.NewArray(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're serializing this is a useless variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants