Use the raw_lines field in stackprof for more precise lines #483

dalehamel · 2024-10-11T02:38:07Z

Since tmm1/stackprof#213 the actual source line within the method, not just the method start line, should be available.

From the existing simple.rb:

speedscope/sample/programs/ruby/simple.rb

Lines 32 to 34 in d69f3d0

On main, it just gives the method start line:

With this patch, we get the actual line number of the call:

I've added some rhudimentary tests, to cover both older stackprof files that are missing raw_lines, and newer ones that have it. I basically just walked "children[0]" manually until i found a node with a line. Since the file is checked in, i think this is fine.

EDIT: addressing frame collisions

As called out in #483 (comment), my first commit could lead to collisions due to frames that were not identical using the same key. This is addressed in 136dd99

Looking at the following code:

speedscope/sample/programs/ruby/simple.rb

Lines 3 to 10 in d69f3d0

    
           def a 
        
             for i in 0..100 do 
        
               b 
        
               c 
        
               e 
        
             end 
        
           end

On main, we only see the frame info showing a calling from line 4:

But now, we have two different frames for a, one with line 5, and one with line 8:

In a more pathological example, I modified simple.rb to insert a sleep on line 16 to show how previously we would in fact lose information. Here is a graph on main:

And the other tree for the code path where it doesn't sleep:

Here it is with the patch:

And the other tree for the code path where it doesn't sleep:

We can see that with the patch, it correctly shows the call site of "sleep" and is able to distinguish these code paths, one for line 16 where sleep is actually called, and one for line 13 for the other code path and indeed another on line 14 for the other instance of b. Previously these would all have had the same value. The graph has more nodes, and we are able to have different instances of the "b" frame.

Since tmm1/stackprof#213 the actual source line within the method, not just the method start line, should be available.

jlfwong · 2024-10-11T04:03:45Z

Hi @dalehamel! Thanks for the patch — happy to merge this after tests are fixed

It looks like the new test file wasn't checked in:

Summary of all failing tests
FAIL src/import/stackprof.test.ts
  ● reads correct line numbers › uses the most precise iseq line if raw_lines is available

    ENOENT: no such file or directory, open './sample/profiles/stackprof/wall_with_raw_lines.json'

      at Object.openSync (node:fs:596:3)
      at readFileSync (node:fs:464:35)
      at src/import/stackprof.test.ts:64:37
      at src/import/stackprof.test.ts:8:71
      at Object.<anonymous>.__awaiter (src/import/stackprof.test.ts:4:12)
      at Object.<anonymous> (src/import/stackprof.test.ts:63:80)

jlfwong · 2024-10-11T04:06:58Z

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c

def a
  b
  c
end

If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

dalehamel · 2024-10-11T10:18:23Z

It looks like the new test file wasn't checked in:

Whoops, i amended the commit but forgot to push.

dalehamel · 2024-10-11T10:21:12Z

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:
a;b
a;c
def a
  b
  c
end
If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

I'll need to look into this further and get back to you

dalehamel · 2024-10-11T11:03:33Z

One possible solution would be to include the line number as part of the key. Looks like it accepts a string, so a quick and dirty way to prevent the collisions would be to append the line number to the frame id, eg:

diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index bd20c5d..e4cd7f5 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -39,8 +39,9 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
       if (frameName == null) {
         frameName = '(unknown)'
       }
+      const key = id.toString(10) + "_" + lineNo.toString(10);
       const frame = {
-        key: id,
+        key: key,
         ...frames[id],
         line: lineNo,
         name: frameName,

This addresses some weirdness I was seeing in one of my profiles which I think was what you called out. This is one solution to ensure that they are keyed differently. I'll get into a bit more detail about this in a bit.

The stackprof frame key is updated to include the frame id in the upper 48 bits, and line number in the lower 16 bits. To prevent truncation when casting to a number, this resulting BigInt is converted to a string. This ensures that frames with the same start line but different call lines do not collide when deduping frames.

dalehamel · 2024-10-11T14:09:24Z

@jlfwong please take another look - i updated the description to address the collision issue you pointed out. I believe I have come up with a keying strategy that should resolve this.

Unfortunately it changes the key from a number to a string, as a general rule I prefer integers for performance reasons but in this case due to number only permitting 53 bits, there is no way to stuff all the info in without the possibility of truncation. So use a similar trick as in tmm1/stackprof#213 to stuff the frame and line number together, but have to convert the result to a string.

If we were to update

speedscope/src/lib/utils.ts

Lines 46 to 49 in d69f3d0

    
             readonly key: string | number 
        
           } 
        
           export class KeyedSet<T extends HasKey> implements Iterable<T> { 
        
             private map = new Map<string | number, T>()

to permit BigInt in addition to string and number, we can avoid this conversion but I'm not sure if there might be unintended consequences to this. For now, since the keyed set accepts strings, I just use convert the resulting BigInt to its base 10 string representation.

dalehamel · 2024-10-12T01:02:01Z

src/import/stackprof.ts

+        // Use a similar trick to https://github.com/tmm1/stackprof/pull/213
+        // generates a unique key for the name id + call line.
+        // Converts to a string to prevent truncation from BigInt to number
+        key: ((BigInt(id) << BigInt(16)) | BigInt(lineNo ? lineNo : 0)).toString(10),


I'm accepting Aaron's argument here about arm and x86 reserving these 16 bits as fact. Using bigint to get a 64 bit container for the bits and so that the bit shifting won't truncate it.

Instead of shifting 48 though, i'm going the other way and shoving the frame ID into the upper bits. This pushes them into the more significant bits to make room for the line number, for which 16 bits is ample.

Unfortunately, I have to convert it to a string because 64 bits is more than we can depend on for a "number", for which only 53 bits are available.

So in the end, all this fancy math leads to a hash key that is about as useful as just converting both to strings and concatenating them.

I am tempted to go with something like this, but there is a small possibility of collisions with it. Keeping it in a numerical type should be faster and use less space than strings though:

From baa30c0ba0130f4638db46585ddcaaa230864053 Mon Sep 17 00:00:00 2001 From: Dale Hamel <[email protected]> Date: Fri, 11 Oct 2024 20:55:02 -0400 Subject: [PATCH] Build stackprof frame key with rhudimentary "hash" Combine frame id with line number using bitwise to try and keep the key within 53 bits. Collisions should be unlikely but this won't win any awards for best hasher. Tries to optimize for the frame address being at most 48 bits, and the line number being unable to exceed 16 bits. XOR with itself and prime number before bit shifting to further reduce loss of precision Should be much faster than string operations, but might come at the cost of correctness in rare cases. --- .../__snapshots__/stackprof.test.ts.snap | 90 +++++++++---------- src/import/stackprof.ts | 23 ++++- 2 files changed, 64 insertions(+), 49 deletions(-) diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts index 93c4c94..db5bbf7 100644 --- a/src/import/stackprof.ts +++ b/src/import/stackprof.ts @@ -40,10 +40,7 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile frameName = '(unknown)' } const frame = { - // Use a similar trick to https://github.com/tmm1/stackprof/pull/213 - // generates a unique key for the name id + call line. - // Converts to a string to prevent truncation from BigInt to number - key: ((BigInt(id) << BigInt(16)) | BigInt(lineNo ? lineNo : 0)).toString(10), + key: bitShiftXorNumbers(id, lineNo ? lineNo : 0), ...frames[id], line: lineNo, name: frameName, @@ -76,3 +73,21 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile return profile.build() } + +/* +This is not really generalizable, just meant to be a relatively fast way to +combine the two numbers while not losing precision. We can't use the same +representation that stackprof uses of just bit shifting, so we "hash" them +by bit shifting and Xoring them. +*/ +function bitShiftXorNumbers(num1: number, num2: number): number { + const prime1 = 31 // A small prime number to "seed" the "hash" + const prime2 = 37 // Another small prime number + + // Use a combination of multiplication and addition to reduce collision risk + const hash1 = (num1 * prime1) ^ (num1 >> 16) // XOR with a right shift of num1 + const hash2 = (num2 * prime2) ^ (num2 << 8) // XOR with a left shift of num2 + + // Shift hash1 left to make room for hash2 and combine them + return (hash1 << 16) ^ hash2 +} -- 2.45.1

Maybe start with strings and switch to a numerical approach like this if perf becomes an issue?

dalehamel · 2024-10-18T19:05:18Z

@jlfwong - gentle nudge as it has been a week, do you have a preference on the key generation approaches? FWIW we've been using the string based approach in prod and it seems to be fine performance-wise.

dalehamel · 2024-10-28T17:58:00Z

@jlfwong last ping I'll give, when you are able to take a look let me know but I'm moving on to other work and running a forked version for now.

jlfwong · 2024-10-31T17:48:47Z

Hey, sorry I've been MIA

Only have a couple minutes right now, but I think I may not have communicated my concern here correctly.

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c
def a
b
c
end
If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

When I was describing the problem here, my concern is that, IMO, we want these two "a" frames to be collapsed. If I understand this change correctly (which I might not be), then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.

This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.

Am I misunderstanding the nature of this change?

dalehamel · 2024-10-31T18:13:11Z

IMO, we want these two "a" frames to be collapsed

FWIW I rolled this out internally a couple of weeks ago, and the response has been positive - developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.

then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.

Yes

This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.

Yes, this does happen. They are displayed with the same name, but then on clicking on them, you can see there are different line numbers. I agree this messes up the "top table" part of the sandwich view, as a frame that has multiple call sites can now be showed below a frame that only has one, but actually has a lower total overall weight. Perhaps the top table code could be amended to just use the frame names as the "key" when performing the aggregation in the table, but show the distinct call sites in the actual sandwich graphical view.

In the "pprof" format, it is possible to store both the method definition line as well as the actual call line. Perhaps a compromise would be to amend the data structures to be able to store both, and have an option to toggle exact line numbers, or line numbers of the method definition only.

Use the raw_lines field in stackprof for more precise lines

5175aa1

Since tmm1/stackprof#213 the actual source line within the method, not just the method start line, should be available.

dalehamel force-pushed the stackprof-linenos branch from fc98015 to 5175aa1 Compare October 11, 2024 10:17

dalehamel commented Oct 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the raw_lines field in stackprof for more precise lines #483

Use the raw_lines field in stackprof for more precise lines #483

dalehamel commented Oct 11, 2024 •

edited

Loading

jlfwong commented Oct 11, 2024 •

edited

Loading

jlfwong commented Oct 11, 2024 •

edited

Loading

dalehamel commented Oct 11, 2024

dalehamel commented Oct 11, 2024

dalehamel commented Oct 11, 2024

dalehamel commented Oct 11, 2024

dalehamel Oct 12, 2024

dalehamel commented Oct 18, 2024

dalehamel commented Oct 28, 2024

jlfwong commented Oct 31, 2024 •

edited

Loading

dalehamel commented Oct 31, 2024

Use the raw_lines field in stackprof for more precise lines #483

Are you sure you want to change the base?

Use the raw_lines field in stackprof for more precise lines #483

Conversation

dalehamel commented Oct 11, 2024 • edited Loading

jlfwong commented Oct 11, 2024 • edited Loading

jlfwong commented Oct 11, 2024 • edited Loading

dalehamel commented Oct 11, 2024

dalehamel commented Oct 11, 2024

dalehamel commented Oct 11, 2024

dalehamel commented Oct 11, 2024

dalehamel Oct 12, 2024

Choose a reason for hiding this comment

dalehamel commented Oct 18, 2024

dalehamel commented Oct 28, 2024

jlfwong commented Oct 31, 2024 • edited Loading

dalehamel commented Oct 31, 2024

dalehamel commented Oct 11, 2024 •

edited

Loading

jlfwong commented Oct 11, 2024 •

edited

Loading

jlfwong commented Oct 11, 2024 •

edited

Loading

jlfwong commented Oct 31, 2024 •

edited

Loading