Autocomplete: Overhaul telemetry and fix various corner cases #412

philipp-spiess · 2023-07-27T11:44:38Z

Closes #178

This PR fixes a bunch of issues with the completions cache that should make it much more easy to understand now (although there are a few corner cases). The analytics code is quite tangled with the implementation right now but I want to get the logic right first before we do some cleanup.

Conceptually, the big thing that has changed is that we now always create a new completion entry for new completion requests. The only exception to that is if the cache would resolve to the very same completion that was already displayed. This special case is necessary to not send too many completion requests if we forward-type a completion like so:

Screen.Recording.2023-07-27.at.13.40.19.mov

There are a bunch of other small fixes that I noticed were necessary that I’m going to explain inline.

Test plan

Open the Output tabs and start playing with completions while keeping track of how the console looks.
We'll have to see how big the difference is in our metrics after pushing this to prod.

…etry

philipp-spiess · 2023-07-27T11:47:00Z

vscode/src/completions/index.ts

@@ -160,15 +160,13 @@ export class CodyCompletionItemProvider implements vscode.InlineCompletionItemPr
            this.abortOpenCompletions = () => abortController.abort()
        }

-        CompletionLogger.clear()


Clearing the completion logger will flush all "open" completions, potentially logging completions that are being visible. This is used to measure for how long completions are necessary and thus we want to delay it as long as possible so we can get the most accurate timing.

Previously, every new completion request would immediately flush the queue. Now, however, a cache hit that matches the properties that I outlined in the PR description and is this "completed forward" won't, so we'll have a more accurate number of how long such completions are visible.

philipp-spiess · 2023-07-27T11:47:40Z

vscode/src/completions/index.ts

@@ -452,3 +512,8 @@ function createTracerForInvocation(tracer: ProvideInlineCompletionItemsTracer):
        tracer(data)
    }
 }
+
+function emptyCompletions(): vscode.InlineCompletionList {
+    CompletionLogger.clear()


Whenever we early return, we want to flush the completions queue. This conceptually makes sense because if we return an empty array at any point, it means no completions form us are visible anymore anyways.

philipp-spiess · 2023-07-27T11:50:12Z

vscode/src/completions/index.ts

+        //
+        // However we only log a completion as started if it's either served from cache _or_ the
+        // debounce interval has passed to ensure we don't log too many start events where we end up
+        // not doing any work at all


The completion logger now has a separate create and start method. This fixes an issue where we haven't properly measured end to end latencies in our completions because we only started it when we actively start to do some work, which happens after both the pre-defined debounce times (30ms) and context retrieval (few ms).

I don't expect this to have a big impact on the overall latencies, certainly not over 50ms, but we should aim to be as close to the real-user experience as possible with these logs.

This won't change the overall amount of events logged, though, because nothing is being logged until the start method is called. We just start the timers a bit earlier.

philipp-spiess · 2023-07-27T11:51:28Z

vscode/src/completions/index.ts

+        const useLogIdFromPreviousCompletion =
+            cachedCompletions?.logId && cachedCompletions?.logId === this.previousCompletionLogId


This line makes sure that "forwarding completions" don't get a new analytics event assigned. In the future, we might be able to do that with the new enableForwardStability option

philipp-spiess · 2023-07-27T11:54:28Z

vscode/src/completions/index.ts

+            // When we serve a completion from the cache and create a new log
+            // id, we want to ensure to only refer to the new id for future
+            // cache retrievals. If we don't do this, every subsequent cache hit
+            // would otherwise no longer match the previous completion ID and we
+            // would log a new completion each time, even if the user just
+            // continues typing on the currently displayed completion.
+            if (logId !== cachedCompletions.logId) {
+                this.config.cache?.updateLogId(cachedCompletions.logId, logId)
+            }


Pretty much a corner case but updating the ids in our cache is necessary in the following scenario:

We start a completion request with ID 1 and display it, cool.

Now the user does something else which will start and show other completion requests.

Eventually, the prefix matches the first completion request again. The cache will retrieve the item with id 1 however that won't match the previousCompletionLogId.

This will cause the above code to create a new ID, say 3.

On the next forward request, the cache would again resolve with the id 1 when the previousCompletionLogId would be 3 which would create yet another completion request...

To fix this, we update the cache to refer to the new log id's whenever they're changed.

I think this is working as you described, I think this comment from me here isn't a blocker, but if we take a step back maybe we should have some centralised system about logId. It looks like that we're trying to sync log and cache IDs in order to avoid double logging when cache is actually the same, maybe we could use some cache encode function for this kind of purpose? In this case there won't be possible to have different ids if prefixes are the same.

Yeah I don't like this bookkeeping either. It's going to be interesting to see what @sqs will come up in #442 where he's currently seemingly not using a cache at all!

The only reason why we need to keep the log id around is to make sure we can detect cases where the same completion is resolve again after some suffix changes. Maybe we can instead just store the whole completion somewhere and compare that instead of the generated ID?

philipp-spiess · 2023-07-27T11:56:01Z

vscode/src/completions/index.ts

-                ),
-            },
-        })
+        CompletionLogger.networkRequestStarted(logId, contextResult.logSummary)


We previously appended the context summary in the completion started event. Now, however, we log these events before the context is fetched so I decided to split this into a separate event. We haven't used any of these yet, though.

philipp-spiess · 2023-07-27T11:59:17Z

vscode/src/completions/index.ts

+        let isCompletionVisible = true
+        if (context.selectedCompletionInfo) {
+            const currentText = document.getText(context.selectedCompletionInfo.range)
+            const selectedText = context.selectedCompletionInfo.text
+            if (!(currentText + results[0].content).startsWith(selectedText)) {
+                isCompletionVisible = false
+            }
+        }


This was a fun one 😅 In some scenarios, completions that are being returned will still not be shown in VS Code. This case specifically happens when the completion we create does not match the text in an eventual VS Code completion popup. While we have ideas to improve this and use the suggested text to guide the completion, this would require a few more changes so for now, we try to detect when VS Code won't show the completion.

That, however, lead to a different issue. We previously measured the duration for a completion as the time between when the completion was started and when the completion was first seen. Since the completion can now be fully loaded and returned but not immediately seen, we've had to make some changes in the completion logger to add a separate timestamp to calculate these correctly.

just reminder for myself, let's discuss it for our next 1-1. I don't think I fully understand this scenario

philipp-spiess · 2023-07-27T12:01:42Z

vscode/src/completions/logger.ts

+        // Log a debug event, this case should not happen in production
+        logCompletionEvent('acceptedUntrackedCompletion')


I managed to reproduce these cases sometimes. E.g. you accept a completion, then do cmd+z to undo, type the same suffix again (without triggering any other completion in the mean time), and accept the completion again.

I wonder how often this happens in practice though, so this just adds some events for us to see. If this happens often, we might need to ensure that this case creates a new log id (and thus conceptually a new completion request).

vovakulikov · 2023-07-28T03:59:13Z

@philipp-spiess sorry I didn't make a review for this today. I haven't touched logs yet in the Cody repo so it'll take some time. I will review it today morning first thing.

…etry

dadlerj · 2023-07-28T17:11:13Z

I'm afraid I'm not really qualified to give a review here 😕 . I can check event names if helpful, but I don't think I am familiar enough with the code or logic to validate the call sites or the edge cases, etc. Let me know if there's anything I can do to help.

vovakulikov

Left a few minor comments and ideas.

vovakulikov · 2023-07-30T19:54:00Z

vscode/src/completions/cache.test.ts

+
+        expect(cache.get('foo \n  ', true)?.logId).toBe('id2')
+        expect(cache.get('foo \n ', true)?.logId).toBe('id2')
+        expect(cache.get('foo \n', true)?.logId).toBe('id2')


why 3 expect calls? Just to make sure that there is no toggle state behind the CompleationsCache?

Yeah I wanted to make sure all occurrences are updated.

vovakulikov · 2023-07-30T19:57:16Z

vscode/src/completions/cache.ts

+    public updateLogId(oldLogId: string, newLogId: string): void {
+        const entries = this.cache.values()
+        for (const value of entries) {
+            if (value && typeof value !== 'string' && 'logId' in value && value.logId === oldLogId) {


What case do we catch with typeof value !== 'string'? it looks like value can't be string if we trust TS types here

You're right that was a mistake 🙈

vovakulikov · 2023-07-30T20:31:04Z

vscode/src/completions/index.ts

+            // When we serve a completion from the cache and create a new log
+            // id, we want to ensure to only refer to the new id for future
+            // cache retrievals. If we don't do this, every subsequent cache hit
+            // would otherwise no longer match the previous completion ID and we
+            // would log a new completion each time, even if the user just
+            // continues typing on the currently displayed completion.
+            if (logId !== cachedCompletions.logId) {
+                this.config.cache?.updateLogId(cachedCompletions.logId, logId)
+            }


I think this is working as you described, I think this comment from me here isn't a blocker, but if we take a step back maybe we should have some centralised system about logId. It looks like that we're trying to sync log and cache IDs in order to avoid double logging when cache is actually the same, maybe we could use some cache encode function for this kind of purpose? In this case there won't be possible to have different ids if prefixes are the same.

vovakulikov · 2023-07-30T20:35:38Z

vscode/src/completions/index.ts

+        let isCompletionVisible = true
+        if (context.selectedCompletionInfo) {
+            const currentText = document.getText(context.selectedCompletionInfo.range)
+            const selectedText = context.selectedCompletionInfo.text
+            if (!(currentText + results[0].content).startsWith(selectedText)) {
+                isCompletionVisible = false
+            }
+        }


just reminder for myself, let's discuss it for our next 1-1. I don't think I fully understand this scenario

philipp-spiess · 2023-07-31T09:40:59Z

@dadlerj

I'm afraid I'm not really qualified to give a review here 😕 . I can check event names if helpful, but I don't think I am familiar enough with the code or logic to validate the call sites or the edge cases, etc. Let me know if there's anything I can do to help.

Ha, thanks! I tagged you mostly so that you're aware of some of changes that we'll be doing. :)

Two quick fixes for regressions I introduced in #412 - Loading indicator felt flaky so this this PR makes sure we start the indicator a bit earlier and terminate in any case. - The opt-out for a deletion case was wrong, causing it to skip the condition if the last action was a deletion and there was no cache response. This was not intended to be changed for now. The correct behavior should still be: If there is a deletion, ONLY continue if twe have an exact cache response. Otherwise return no completions. ## Test plan - tested locally,

philipp-spiess added 2 commits July 27, 2023 13:36

Autocomplete: Overhaul telemetry and fix various corner cases

cdf9f9a

Merge remote-tracking branch 'origin/main' into ps/autocomplete-telem…

6016a89

…etry

philipp-spiess requested review from dadlerj, vovakulikov and a team July 27, 2023 11:44

philipp-spiess self-assigned this Jul 27, 2023

Fixes

59cb305

philipp-spiess commented Jul 27, 2023

View reviewed changes

Fixes

64c1978

Merge remote-tracking branch 'origin/main' into ps/autocomplete-telem…

77757b2

…etry

vovakulikov approved these changes Jul 30, 2023

View reviewed changes

Remove impossible case

ab7e49e

philipp-spiess merged commit 33f97ba into main Jul 31, 2023
8 checks passed

philipp-spiess deleted the ps/autocomplete-telemetry branch July 31, 2023 09:56

philipp-spiess mentioned this pull request Jul 31, 2023

Autocomplete: Fix loading indicator and deletion opt-out #456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocomplete: Overhaul telemetry and fix various corner cases #412

Autocomplete: Overhaul telemetry and fix various corner cases #412

philipp-spiess commented Jul 27, 2023

philipp-spiess Jul 27, 2023

philipp-spiess Jul 27, 2023

philipp-spiess Jul 27, 2023

philipp-spiess Jul 27, 2023

philipp-spiess Jul 27, 2023

vovakulikov Jul 30, 2023

philipp-spiess Jul 31, 2023

philipp-spiess Jul 27, 2023

philipp-spiess Jul 27, 2023

vovakulikov Jul 30, 2023

philipp-spiess Jul 27, 2023

vovakulikov commented Jul 28, 2023

dadlerj commented Jul 28, 2023

vovakulikov left a comment

vovakulikov Jul 30, 2023

philipp-spiess Jul 31, 2023

vovakulikov Jul 30, 2023

philipp-spiess Jul 31, 2023

vovakulikov Jul 30, 2023

vovakulikov Jul 30, 2023

philipp-spiess commented Jul 31, 2023

		const useLogIdFromPreviousCompletion =
		cachedCompletions?.logId && cachedCompletions?.logId === this.previousCompletionLogId

		// Log a debug event, this case should not happen in production
		logCompletionEvent('acceptedUntrackedCompletion')

Autocomplete: Overhaul telemetry and fix various corner cases #412

Autocomplete: Overhaul telemetry and fix various corner cases #412

Conversation

philipp-spiess commented Jul 27, 2023

Test plan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vovakulikov commented Jul 28, 2023

dadlerj commented Jul 28, 2023

vovakulikov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philipp-spiess commented Jul 31, 2023