How should gptel distinguish between LLM responses and user text? #321

karthink · 2024-05-23T00:10:33Z

karthink
May 23, 2024
Maintainer

Currently gptel tags the text of LLM responses so it can distinguish between its responses and user prompts. The exact way it does this in Elisp is irrelevant (or not yet relevant) to this discussion. As it turns out, there are several subtleties to this behavior that are unresolved.

To figure these out, I would like your input and feedback on the following two questions:

If you move the cursor into a response region and type in text, should that new text be considered part of the response, or should it break the response into two regions separated by a new user prompt?
If you copy some text from a response region and yank it -- elsewhere into this buffer or into another one -- should it continue to be recognized by gptel as an LLM response, or is it now part of the user prompt?

Before you reply: I've heard from users who believe it should obviously work this way, and would not understand why anyone would want the opposite behavior... for both values of this. Consider that there are situations where both possible behaviors are useful. The question is about your mental model of the response: is the LLM response a feature of the text itself, or is it a feature of the position and context of the text in the buffer? (As you might expect, these correspond roughly to two ways of marking text in Emacs buffers, with text-properties or overlays.)

jwr · 2024-05-23T17:29:12Z

jwr
May 23, 2024

Since you asked for feedback: I have two mental models when working in Emacs:

If I'm looking at a gptel conversational buffer, I expect behavior similar to all the online "AI chat" interfaces. I don't expect the responses to be editable at all. If they are, the answer to your question 1 would be "I expect editing a response would break it into two regions separated by a new user prompt". The answer to your question 2 is that it should always be copied as "just text", with no extra magic. In other words, it's my text now if I copied it.
If I'm working in my own buffer, my mental model is that it's all "just text". My text. I do not want any differentiation between what I typed and what LLM produced. If it's my buffer, it's my text. So the answer to both of your questions is the same, I do not expect or want any extra processing, I just want to send whatever is selected (or whatever is before the point) with the directive prepended. I do realize that this might sound limiting to some people, because you lose the ability to send multiple prompt/response pairs this way, but that's what conversational buffers are for. If I work in my own buffer, I do not want a "conversation", I'm using an LLM to work on my data, and I do not want any behind-the-scenes magic, it needs to be simple and predicable. As a measuring stick: looking at a buffer displayed in Emacs, you should be able to immediately tell what will get sent to the LLM. If you can't without asking about previous queries, state and history, something is not OK.

1 reply

karthink Jun 27, 2024
Maintainer Author

If they are, the answer to your question 1 would be "I expect editing a response would break it into two regions separated by a new user prompt". The answer to your question 2 is that it should always be copied as "just text", with no extra magic.

Once you set gptel-track-response appropriately, this is how it behaves.

kurnevsky · 2024-05-26T14:36:11Z

kurnevsky
May 26, 2024

I'd consider it a part of the response - sometimes it's useful to edit model response to something different (e.g. when it's highly censored) and then ask a follow up question.

6 replies

karthink Jun 27, 2024
Maintainer Author

The current way that responses are tracked (I'm talking only about the dedicated chat buffer) is through text properties. This isn't really ideal if I'm being honest, because sometimes I would like to add something to the very beginning of the text, or make an unorthodox modification to the buffer, and when I do a dry-run I find out that the history has been messed up in one way or another.

The problem here is not the text-properties, it's their level of stickyness. As described in the opening question, there are two mental models possible here, and the same history will appear consistent or messed up depending on which model you have.

If you believe that typing in text in the middle of a previous response should count as part of that response, then gptel's current behavior will be incorrect and/or confusing.

On the other hand, if you believe that anything you type in should be assigned the user role, the behavior will seem consistent.

i.e. is the "assistant" role of the response a property of the actual content, or of its location in the buffer? Text-properties can simulate either behavior, but there are problems with both.

gptel used to work in the following way: text added in the middle of the response was assigned the "assistant" role.

But it created lots of edge cases where ostensibly "user" text was also assigned the assistant role. The current behavior might be less than ideal but it's unambiguous: anything you add is always assigned the "user" role.

My suggestion would have to track the conversation through the markdown or org-mode headers, but that would force users to use them, and it would invalidate some use-cases I suppose.

gptel is meant to fit in seamlessly in any workflow/buffer, so I don't want to add syntax.

daedsidog Jun 28, 2024

The problem here is not the text-properties, it's their level of stickyness. As described in the opening question, there are two mental models possible here, and the same history will appear consistent or messed up depending on which model you have.

My mistake. In such a case then I am n favor of the second model, if I understood it correctly.

Sometimes, I wish to start the chat with an assistant response even, then write my response below, and then send the query where I already filled his initial response. This is quite orthogonal for the models listed, but something to consider nonetheless.

karthink Jun 28, 2024
Maintainer Author

In such a case then I am n favor of the second model, if I understood it correctly

I assume by "second model" you mean the one where anything typed by the user (anywhere in the buffer) is unconditionally assigned the user role, which is gptel's current behavior.

Sometimes, I wish to start the chat with an assistant response even, then write my response below, and then send the query where I already filled his initial response. This is quite orthogonal for the models listed, but something to consider nonetheless.

Yeah, I would like this feature too. I was hoping that there would be some way to do this seamlessly using Emacs' regular editing features, and without adding syntax to the chat. But I haven't found a way. gptel could always provide commands like gptel-assign-user-role and gptel-assign-assistant-role that you can apply to text regions, but I don't like this design, it feels cumbersome. There must be a better way.

For now, you can do this manually, by applying the gptel text property to text you want to mark as a response:

(put-text-property
 (region-beginning) (region-end)
 'gptel 'response)

(Of course, the internal method used to distinguish responses may change in the future, so this method is not guaranteed to work indefinitely.)

daedsidog Jun 28, 2024

I assume by "second model" you mean the one where anything typed by the user (anywhere in the buffer) is unconditionally assigned the user role, which is gptel's current behavior.

Seems like I got a lot of the models mixed between multiple replies here, so I'll just make it clear: I would prefer that any change I make to the chat buffer would be seen as a change made to the conversational history. If I add text to the assistant's response, it still is his response. If I yank part of his text around, and add it in MY response, it will be MY response (in this case, the text property of the response gets dropped).

The current behavior is something I usually try to avoid.

For now, you can do this manually, by applying the gptel text property to text you want to mark as a response:

This is how I've been doing what I described so far.

In my opinion, the chat buffer is a chat, and a chat works by having participants and their messages appearing in a coherent, understandable order. I understand there may be other use cases, but I think they should be relegated to be done outside the chat buffer.

Although, I am strictly thinking in terms of the OpenAI API interface, i.e. with the roles and whatnot. What I prefer might not work well for other backends.

karthink Jun 28, 2024
Maintainer Author

In my opinion, the chat buffer is a chat, and a chat works by having participants and their messages appearing in a coherent, understandable order.

If a chat buffer is a chat, then it does not make sense to modify its contents except by appending at the end (like in most web interfaces), and then both of gptel's tracking behaviors are identical. It does not make sense to change the context chunks mid way through the chat either.

karthink · 2024-06-28T23:41:36Z

karthink
Jun 28, 2024
Maintainer Author

> I assume by "second model" you mean the one where anything typed by the user (anywhere in the buffer) is unconditionally assigned the user role, which is gptel's current behavior. Seems like I got a lot of the models mixed between multiple replies here, so I'll just make it clear: I would prefer that any change I make to the chat buffer would be seen as a change made to the conversational history. If I add text to the assistant's response, it still is his response. If I yank part of his text around, and add it in MY response, it will be MY response. The current behavior is something I usually try to avoid.

Right. Implicitly you are assuming that the response == a region of the buffer, and not the contents of the response text itself. This is the behavior you would expect, for example, if you used an overlay to track the response bounds. I understand the appeal of this approach, but when you implement it it turns out not to work very well. gptel used to work this way and it was fine some of the time (~80%), but there were too many edge cases, and it was easy to lose this overlay (or overlay-like behavior) because of the various things Emacs and other minor modes do in buffers. If I can find a way to implement this behavior robustly I'll add it again.

0 replies

Inkbottle007 · 2024-07-15T17:42:33Z

Inkbottle007
Jul 15, 2024

potential need for quoting tags

Indeed, we really need to be able to edit the "response" part, which in the current case will fragment it and require (put-text-property (region-beginning) (region-end) 'gptel 'response). But we must be 100% thorough, one mistake and everything is a mess. There are many cases when you want to edit an LLM response: The response is too long and you want to cut unnecessary parts, or the response is inaccurate and you prefer to fix it manually rather than telling the LLM it was wrong. My interaction structure with org-mode is *** ASSISTANT: and *** USER:. But you can't expect the LLM to follow that structure, so you have to fix it now and then. As it stands, this is not an option. Additionally, if you type C-' in some code block (org-mode), the code will automatically be edited.

Also, you might want to use part of the response in your request. Currently, I paste it in the terminal with cat | wl-copy. If you forget to do that, well, don't.

I think the best solution would be a visible structure, similar to HTML/XML tags. Similar to JSON. Similar to Emacs' S-expressions. I mean using opening and closing tags.

That structure could be orthogonal to the existing structure in the document: It would be ignored by org-mode and conversely, the new tagging system would ignore the underlying org-mode structure.

So that would be <gptel-response> / </gptel-response>.

There would be no ambiguity anymore: We could edit, copy, or do whatever we want, provided we don't meddle with the intertwined new tags.

(Note: if we want to be able to quote the tags of the new structure, which is far from unimaginable, we can add a UUID to the tags: <gptel-response-123> / </gptel-response-123>, with the UUID changing from response to response.)

5 replies

daedsidog Jul 15, 2024

Perhaps instead of tags, a very minor & non-intrusive color-coded overlay, with the ability to switch roles?

Inkbottle007 Jul 15, 2024

I've experimented with using color-coded overlays for distinguishing LLM responses, but it was problematic with modes like org-indent-mode and other font-lock-modes introducing their own color changes. The visual results were often messy and confusing. Users would find it difficult to interpret the significance of color variations, such as whether they indicated text edits or were a result of the editing environment. There were also many blank spaces interspersed within the text.

While non-intrusive color-coding seems like a simple solution, in practice, it introduces ambiguity and fails to provide a clear, understandable indication of LLM responses and user edits.

Having used gptel extensively and given this matter a lot of thought, I believe a robust tagging system might still be necessary for clarity and reliability. This wouldn’t even preclude the use of a color-coded overlay for additional visual aid.

Inkbottle007 Jul 16, 2024

I've tested the PR and set (setq gptel-highlight-assistant-responses t). The responses are highlighted in blue. However, there are some critical limitations.

When I introduce a newline in the middle of a response, it fragments the text. This causes the computed JSON payload to change from assistant to assistant; user; assistant. This is evident in the changes within :GPTEL_BOUNDS: when saving the buffer:

:GPTEL_BOUNDS: ... (16200 . 18073) (18543 . 20637)

turns into:

:GPTEL_BOUNDS: ... (16216 . 18089) (18559 . 19965) (19966 . 20654)

The color overlay alone does not provide any indication of this fragmentation. There's no visual distinction to show that the assistant's response has been broken by user input.

For these reasons, I believe a robust tagging system with opening and closing tags is the simplest solution. Initially, I considered invisible tags hidden in text properties, possibly within zero-width characters, with the background color spanning from the open tag to the close tag.

However, I realized that visible open and close tags would be even simpler and clearer, making it easier for users to manage text sections.

But, that's just my opinion.

(Here the exact version I've tested: c242bdf)

karthink Jul 16, 2024
Maintainer Author

I think the best solution would be a visible structure, similar to HTML/XML tags. Similar to JSON. Similar to Emacs' S-expressions. I mean using opening and closing tags.

I've tested the PR and set (setq gptel-highlight-assistant-responses t). The responses are highlighted in blue. However, there are some critical limitations.

I don't think that either of these are a good fit for gptel's goal, which is to blend in as seamlessly and invisibly as possible into other Emacs workflows. I don't want to introduce markup, which is disruptive to this goal. For example, when you are using gptel in a document you intend to distribute (via export or otherwise) or submit. Between properties and local variables, Org and Emacs already have non-intrusive mechanisms for storing metadata, and gptel uses them.

The color overlay alone does not provide any indication of this fragmentation. There's no visual distinction to show that the assistant's response has been broken by user input.

Indeed, this is why coloring responses isn't part of gptel right now.

For these reasons, I believe a robust tagging system with opening and closing tags is the simplest solution.

This makes sense only if you think of the purpose of the buffer/session as LLM interaction. Other packages have taken this route, I have a different take on this. LLM-interaction is one tool among many, you don't use buffer text (tags) to give it explicit instructions any more than you instruct an LSP server or linter this way.

Please see my next comment (to be written shortly) for more explanation and alternative approaches.

daedsidog Jul 16, 2024

When I introduce a newline in the middle of a response, it fragments the text. This causes the computed JSON payload to change from assistant to assistant; user; assistant. This is evident in the changes within :GPTEL_BOUNDS: when saving the buffer:

Must be a bug in my PR then. As far as I'm aware the parsing is done by the gptel response property, so a fragmentation should be visible.

EDIT: Ah, I see what you mean now. Yes, adding a newline won't be visible without a change to the stickyness, and would give no visible indication. A fix would be to trim empty user messages.

karthink · 2024-07-16T01:34:50Z

karthink
Jul 16, 2024
Maintainer Author

Indeed, we really need to be able to edit the "response" part, which in the current case will fragment it

@Inkbottle007, @daedsidog I think you are taking the "response == buffer region" semantic model for granted. This is why indicating responses visually, copying text (etc) doesn't work how you expect.

The "response == buffer region" model is fine, and I'm okay switching to it since it's the popular one so far. The question is how to implement it.

The behavior you want maps 1:1 to overlays, so the easy fix would be to use overlays instead of (or in addition to) text properties to demarcate response boundaries. Then everything, including copying response text to other buffers and visual indications of response regions will work as you expect.

Switching to overlays is still a fair amount of work, though.

As a precursor and preview to that, you can try the following:

(setf (alist-get 'gptel text-property-default-nonsticky nil 'remove) nil)

This should do most of what you want, but it introduces some hidden gotchas, especially in markdown-mode.

1 reply

daedsidog Jul 16, 2024

As a precursor and preview to that, you can try the following:

I do have that as default behavior in my config. But honestly I have no qualms with how the responses are broken down with user meddling, so long as I get to see it. What bothered me more is how responses tend to get mixed with my queries over time, unknowingly.

I think I slowly start understanding what the crux of the issue is: how responses are tracked elsewhere (am I right?) Frankly, I can't think of any use-case where one would need to keep track of a response anywhere outside of the gptel chatting buffer... so far it has been nothing but a nuisance for me (e.g. get code from chat buffer, slightly modify, add an instruction to modify it for refactor, and meet with a completely mangled query).

I'm not saying there isn't a use-case, I would just very much like to see one for example. I suppose the ability to chat in any arbitrary buffer?

Inkbottle007 · 2024-07-16T02:10:41Z

Inkbottle007
Jul 16, 2024

I don't think that either of these are a good fit for gptel's goal, which is to blend in as seamlessly and invisibly as possible into other Emacs workflows. I don't want to introduce markup, which is disruptive to this goal. For example, when you are using gptel in a document you intend to distribute (via export or otherwise) or submit. Between properties and local variables, Org and Emacs already have non-intrusive mechanisms for storing metadata, and gptel uses them.

What about using invisible open/close tags? Just as the response property is currently applied to every character, we could have an open-tag in the text property of the first character of the response region and a close-tag in the text property of the last character.

This approach would address the issue where subsequent user text is mistaken for a response, as discussed earlier. Importantly, it wouldn't appear in the document's actual text, maintaining the goal of non-intrusiveness.

Of course, if the user meddles with those characters, everything will end up mixed up again. That's why I thought that showing those boundaries would help. However, I understand the intention to keep gptel seamless and invisible within the workflow.

1 reply

karthink Aug 21, 2024
Maintainer Author

The current plan is to switch to using overlays instead of text-properties to track responses. This appears to map more closely to users' mental model of what a response is, and it should also make it easy to demarcate responses visually and consistently (we can just color the overlay).

karthink · 2024-09-02T17:50:54Z

karthink
Sep 2, 2024
Maintainer Author

In the feature-overlays branch, I've switched gptel over to using overlays (as discussed above), resolving a bunch of expectation mismatches at once. You should now be able to

Actually edit responses, i.e. the edits will be part of the response
Highlight/color-code responses accurately and without jank (will add a user option)
Copy responses to other buffers as plain text

It needs some testing, so if you're interested please switch to the feature-overlays branch and give it a go. Once I'm confident that all edge cases are covered I'll merge it into master.

0 replies

karthink · 2024-09-03T20:58:14Z

karthink
Sep 3, 2024
Maintainer Author

I've found the first problem with using overlays to track responses -- if you kill text and undo, or even just undo and redo, the overlays don't come back so the tracking is gone.

3 replies

daedsidog Sep 3, 2024

What do you mean by "undo and redo"? Killing parts of the region and undoing it works correctly for me. EDIT: Ah, killing from the edges of the overlays does not restore them. That is pretty inconvenient.

karthink Sep 4, 2024
Maintainer Author

Similarly, if you undo and redo after gptel inserts a response, so that the response is removed and inserted again, the overlay is gone. This is a problem.

Inkbottle007 Oct 14, 2024

I agree that while it resolves some issues, it introduces some others.

Inkbottle007 · 2024-11-04T19:38:01Z

Inkbottle007
Nov 4, 2024

I have a workflow that works. I use a simple function that unambiguously highlights the attributions and then you just have to fix the inconsistencies by hand (using put-text-property as karthink suggested earlier). The very simple logic of the highlighting function says that it should work. Also, I've thoroughly tested the edge cases.

I think we should keep the text property based attribution system.

Since the text is natural language, I don't think there is a single (unified) solution that addresses all cases. However a hybrid solution like the workflow described here should be sufficient. There is the question of hooking (the highlighting function on changes in the buffer) but I don't know how this could be done without being CPU intensive.

(defun gptel-highlight-responses ()
  "Highlight response segments with overlays."
  (interactive)
  (save-excursion
    (goto-char (point-max))
    (while (setq prop (text-property-search-backward
                       'gptel 'response
                       (when (get-char-property (max (point-min) (1- (point)))
                                                'gptel)
                         t)))
      (let ((role (if (prop-match-value prop) "assistant" "user"))
            (overlay (make-overlay (prop-match-beginning prop)
                                   (prop-match-end prop))))
        (overlay-put overlay 'face
                     `(:background ,(if (equal role "assistant")
                                        "lightblue"
                                      "lightgreen")
                                   :extend t))
        (overlay-put overlay 'gptel-response-overlay t)))))

Additional information can be found here.

10 replies

Inkbottle007 Nov 7, 2024

@daedsidog
I've just managed to use pr-343 (f305172e2b104e703954cd0aa2add124714f7ddb), and from what I understand, the synergy of techniques you employed has led to a very successful outcome: it seems to work like a charm.

I turned to using overlays because as @karthink, I couldn't get past the overcrowded face space issue.

But it seems you yourself have found a way around that.

And there is also the question of the stickyness that you have set right. (Or maybe it's something else that produces this excellent result.)

Those two factors put together seem to produce the right outcome.

daedsidog Nov 7, 2024

@Inkbottle007 Stickyness seems to be a matter of taste.

With stickyness enabled, any text you insert will be part of the target role. This is handy if you want to modify the model's response.

Without stickyness, your inserted text will "fragment" the model response into two different ones, and your text will be inserted between them. I personally see very little benefit to this.

The only downside of having stickyness is that sometimes, you mistakenly delete the newlines at the end of the response, and so when you attempt to write a prompt, it will be considered as part of the previous assistant response. It happens rarely, and I consider it worth the cost, as since we are able to swap roles, we can control it.

Inkbottle007 Nov 7, 2024

@daedsidog If I understand correctly with f305172e2b10 stickiness is enabled by default, right? Is it customizable? I would like to try with stickiness disabled, how do I do that?
I've found the option (setq gptel-highlight-assistant-responses t), which I enabled.
Is the way to disable stickiness to do (add-to-list 'text-property-default-nonsticky '(gptel . nil))?

daedsidog Nov 7, 2024

(add-to-list 'text-property-default-nonsticky '(gptel . t)) disables stickyness.

Inkbottle007 Nov 8, 2024

@daedsidog I’ve tested both versions, with and without stickiness, and they perform as expected. I also believe the sticky version feels more natural and should be the default, while the non-sticky version should serve as an alternative.

Inkbottle007 · 2024-11-28T20:20:39Z

Inkbottle007
Nov 28, 2024

I believe the use of highlighting is the right answer to this question. I've been using @daedsidog's take on this solution for two weeks, but have reverted to my own implementation based on overlays that "I have reason to believe" are best for the intended scenario. My version seems very robust, perhaps to the point of being a drawback, and requires some tweaking. However, you can use it already if you want, because it is very convenient. (load-file "/path/to/gptel-highlight-v4.el") gptel-highlight-start gptel-highlight-stop.
I've incorporated @daedsidog (add-to-list 'text-property-default-nonsticky '(gptel . nil)) and gptel-highlight-toggle-response.

Note that I am using the default convention of gptel, which is that the assistant text-property is gptel response and the user counterpart text-property is no text-property at all. Also, gptel-highlight-stop removes the stickiness.

gptel-highlight-v4.el

(I already have specific ideas on how to do the tweaking and will work on it asap.)

2 replies

daedsidog Nov 29, 2024

Note that I am using the default convention of gptel, which is that the assistant text-property is gptel response and the user counterpart text-property is no text-property at all. Also, gptel-highlight-stop removes the stickiness.

I don't recall if I mentioned this before or not, but I introduced my own convention in order to allow me to "toggle" the roles of specific responses. Removing the the gptel text property results in not being able to revert a user message back to an assistant message. At the time it seemed like a good idea.

It's also worth mentioning that, IIRC, the gptel convention is not "no text property at all", but the gptel text property set to nil, which in some areas were overleniently considered equivalent.

Inkbottle007 Dec 1, 2024

Note that I am using the default convention of gptel, which is that the assistant text-property is gptel response and the user counterpart text-property is no text-property at all. Also, gptel-highlight-stop removes the stickiness.

I don't recall if I mentioned this before or not, but I introduced my own convention in order to allow me to "toggle" the roles of specific responses. Removing the the gptel text property results in not being able to revert a user message back to an assistant message. At the time it seemed like a good idea.

Yes, I figured that could have been the reason. (Without it one can always make use of undo, though.)

It's also worth mentioning that, IIRC, the gptel convention is not "no text property at all", but the gptel text property set to nil, which in some areas were overleniently considered equivalent.

Well, when I mentioned the gptel convention, I am not sure if there is an explicit one. From what I observe, gptel adds the gptel response property to the text received from the server, and doesn't modify the rest of the buffer, leaving it as it were, unpropertized regarding gptel property.

From my perspective, I've been using (next-single-property-change pos 'gptel nil visible-end) because it's leaner than (get-char-property (max (point-min) (1- (point))) 'gptel).

With next-single-property-change, there is no need for a predicate, and you get the result directly as a return value. This approach avoids querying global variables through (prop-match-beginning prop) and (prop-match-end prop), and it aligns more closely with a functional programming style.

This function, next-single-property-change, has been working perfectly with the current gptel implementation but wasn't compatible with the pr-343 gptel property style.

How should gptel distinguish between LLM responses and user text? #321

karthink May 23, 2024 Maintainer

Replies: 10 comments · 29 replies

karthink Jun 27, 2024 Maintainer Author

karthink Jun 27, 2024 Maintainer Author

karthink Jun 28, 2024 Maintainer Author

karthink Jun 28, 2024 Maintainer Author

karthink Jun 28, 2024 Maintainer Author

potential need for quoting tags

karthink Jul 16, 2024 Maintainer Author

karthink Jul 16, 2024 Maintainer Author

karthink Aug 21, 2024 Maintainer Author

karthink Sep 2, 2024 Maintainer Author

karthink Sep 3, 2024 Maintainer Author

karthink Sep 4, 2024 Maintainer Author

karthink
May 23, 2024
Maintainer

Replies: 10 comments 29 replies

karthink Jun 27, 2024
Maintainer Author

karthink Jun 27, 2024
Maintainer Author

karthink Jun 28, 2024
Maintainer Author

karthink Jun 28, 2024
Maintainer Author

karthink
Jun 28, 2024
Maintainer Author

karthink Jul 16, 2024
Maintainer Author

karthink
Jul 16, 2024
Maintainer Author

karthink Aug 21, 2024
Maintainer Author

karthink
Sep 2, 2024
Maintainer Author

karthink
Sep 3, 2024
Maintainer Author

karthink Sep 4, 2024
Maintainer Author