Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PARKED] Collect timeout feature #107

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

Conversation

DylanC
Copy link
Contributor

@DylanC DylanC commented Feb 6, 2020

Feature will allow a timeout to be passed as argument. This argument will be used to delay execution of collect regarding the 0x lib.

@DylanC DylanC marked this pull request as ready for review February 7, 2020 10:55
@DylanC DylanC requested a review from goto-bus-stop February 10, 2020 10:28
index.js Outdated Show resolved Hide resolved
@goto-bus-stop
Copy link
Contributor

goto-bus-stop commented Feb 11, 2020

So 0x has this file that turns a v8 profiler log into an array of stack traces called ticks. Each element in the array is the stack trace at a particular point in time. Those stack traces will be turned into the flame graph that we see at the end in Clinic Flame. To implement the timeout filtering, we would want to remove from the ticks array the stack traces that were captured before our timeout.

The V8 profiler output has JSON objects with some data about each captured stack trace, and that is turned into our ticks array here:

https://github.com/davidmarkclements/0x/blob/ea239f31379922451456518479bf442499426853/lib/v8-log-to-ticks.js#L122-L126

Here, the tick variable is one of V8's JSON objects.

I think we can check that the stack trace was recorded after our timeout, and only do the ticks.push(stack.reverse()) call if that is true. So we have to check if the tick object was generated after our timeout had passed.

We can look at how these tick objects look by manually generating a V8 profile:

# outputs an isolate-0x123456789-12345-v8.log file
node --prof examples/simple/index.js
# turn it into JSON
node --prof-process --preprocess -j isolate-*-v8.log > log.json

(The 0x repository contains a file examples/simple/index.js that you can use to try this out)

The log.json file that we created with the second command will contain a large JSON object. We're interested in the "ticks" property, which contains these tick objects. "ticks" contains an array that looks like the below:

"ticks": [
    {
  "tm": 23385,
  "vm": 0,
  "s": [
    1, 11388029, 1, 12561680, 36939, 150, 36938, 989, 36936, 113, 36935, 80, 36934, 35, 36933, 90
  ]
},
...
]

So each tick object in our code above is going to be an object like { tm: number, vm: number, s: [array of numbers] }.

  • s is our stack trace. to save space, it uses indices into the "code" array instead of strings. The addr.map((n) => codes[n]) line turns these indices into familiar stack trace strings.
  • vm is some v8 state value, idk what it's for but we don't use it 😄
  • tm is the timestamp. This is the number of nanoseconds since the start of the VM.

So, we need to run the ticks.push(stack.reverse()) statement only if the tm value (which is expressed in nanoseconds) is larger than our desired delay (which is in milliseconds).

Does that clarify things a bit?

@DylanC
Copy link
Contributor Author

DylanC commented Feb 11, 2020

@goto-bus-stop - A bit yeah. I'll look through the code base and make sense of how this is working regarding your comment.

@DylanC
Copy link
Contributor Author

DylanC commented Feb 12, 2020

@goto-bus-stop - I have the parameter collectDelay filtering all the way down to my copy of 0x.
https://github.com/DylanC/0x/tree/collect_timeout_feature

I also have some sudo code inside v8-log-to-ticks.js. Just need to implement a timestamp value for the collectDelay timeout and filter the results based on this value.

@DylanC
Copy link
Contributor Author

DylanC commented Feb 14, 2020

@goto-bus-stop - Just investigating the CI failure. Its not due to linting but detect-port.test seems to be failing.

test/detect-port.test.js ........[eval]:1
throw new Error()
^
Error
at [eval]:1:7
at Script.runInThisContext (vm.js:116:20)
at Object.runInThisContext (vm.js:306:38)
at Object. ([eval]-wrapper:9:26)
at Module._compile (internal/modules/cjs/loader.js:955:30)
at evalScript (internal/process/execution.js:80:25)
at internal/main/eval_string.js:23:3
Target subprocess error, code: 1

@goto-bus-stop
Copy link
Contributor

Tests fail locally for me with --collectDelay is not a recognised flag. CI installs 0x from npm and the version published on npm doesn't include that flag yet.

Normally we change the dependency in Flame's package.json to something like "0x": "your-github-username/0x#branchname" while the upstream 0x PR is still open, so npm installs 0x from your github fork. Once a new 0x version is published, we change it back to that version before merging the Flame PR.

index.js Outdated Show resolved Hide resolved
Copy link
Contributor

@goto-bus-stop goto-bus-stop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! One of the commits accidentally included .DS_Store files, you can add that to the gitignore file in this PR.

test/cmd-collect-delay.test.js Outdated Show resolved Hide resolved
test/cmd-collect-delay.test.js Outdated Show resolved Hide resolved
test/cmd-collect-delay.test.js Outdated Show resolved Hide resolved
@DylanC
Copy link
Contributor Author

DylanC commented Mar 2, 2020

@goto-bus-stop - I have one error left to resolve for this change. In test/cmd-collect-delay.test.js:

test/detect-port.test.js ........[eval]:1
throw new Error()
^
Error
at [eval]:1:7
at Script.runInThisContext (vm.js:122:20)
at Object.runInThisContext (vm.js:329:38)
at Object. ([eval]-wrapper:6:22)
at Module._compile (internal/modules/cjs/loader.js:778:30)
at evalScript (internal/bootstrap/node.js:589:27)
at startup (internal/bootstrap/node.js:265:9)
at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)
Target subprocess error, code: 1

@goto-bus-stop
Copy link
Contributor

@DylanC that's actually expected output from a test that tests a failure case … would be good to silence that somehow 😅
Failures can be a bit obscured on CI because of the lengthy TAP output. I usually Ctrl+F for "not ok", which TAP prints when a test fails. Here, it looks like delayOneSecond is still found in the tree:

image

Timing on CI can be a real bother. Maybe the 300ms extra delay was still not enough… You could try adding a few more 100ms of wiggle room, if that doesn't work we may have to think of something smarter.

@DylanC
Copy link
Contributor Author

DylanC commented Mar 3, 2020

@goto-bus-stop - Not sure what happened but the CI seems to be stuck. 3 did pass after the last collect delay increase.

@goto-bus-stop
Copy link
Contributor

goto-bus-stop commented Mar 3, 2020

hmm yeah i'm looking at it too…seems like allsome of the other ones are still seeing the delayOneSecond node :/

there are two .DS_Store files in this PR still. could you get rid of them?

@goto-bus-stop
Copy link
Contributor

Some of the other tests became(?) flaky? Working on that in #108

@goto-bus-stop goto-bus-stop changed the title Collect timeout feature [PARKED] Collect timeout feature May 28, 2020
@jasnell jasnell changed the base branch from master to main August 21, 2020 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants