Skip to content

Commit

Permalink
TokenCount performance issues (microsoft#580)
Browse files Browse the repository at this point in the history
### Motivation and Context

We encountered some performance issues and implemented the following
fix.

<!-- Thank you for your contribution to the chat-copilot repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

We updated the TokenUtils class within the Skills Web API to address
these issues. A newly-introduced tokenizer has been implemented for
encoding text, which significantly improves the efficiency of the
TokenCount method. As a result of this change, the performance of the
token counting operations has been enhanced, ensuring more accurate and
faster responses for users.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [Contribution
Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
  • Loading branch information
JohanYman authored Nov 7, 2023
1 parent 68d6e0a commit 39edfcc
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion webapi/Skills/Utils/TokenUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ namespace CopilotChat.WebApi.Skills.Utils;
/// </summary>
public static class TokenUtils
{
private static SharpToken.GptEncoding tokenizer = SharpToken.GptEncoding.GetEncoding("cl100k_base");

/// <summary>
/// Semantic dependencies of ChatSkill.
/// If you add a new semantic dependency, please add it here.
Expand Down Expand Up @@ -98,7 +100,6 @@ internal static void GetFunctionTokenUsage(SKContext result, SKContext chatConte
/// <param name="text">The string to calculate the number of tokens in.</param>
internal static int TokenCount(string text)
{
var tokenizer = SharpToken.GptEncoding.GetEncoding("cl100k_base");
var tokens = tokenizer.Encode(text);
return tokens.Count;
}
Expand Down

0 comments on commit 39edfcc

Please sign in to comment.