-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for new cohere command r models #118
base: main
Are you sure you want to change the base?
Conversation
41663ab
to
a3355ea
Compare
if (requestBody.message !== undefined) { | ||
// NOTE: We approximate the token count since this value is not directly available in the body | ||
// According to Bedrock docs they use (total_chars / 6) to approximate token count for pricing. | ||
// https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prepare.html | ||
spanAttributes[AwsSpanProcessingUtil.GEN_AI_USAGE_INPUT_TOKENS] = Math.ceil(requestBody.message.length / 6); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs, this data should be available in the response body. However when logging the response body in the implementation it seems the data is not actually there. As a result, I decided to stay with this token approximation approach.
if (responseBody.prompt !== undefined) { | ||
} else if (currentModelId.includes('cohere.command-r')) { | ||
console.log('Response Body:', responseBody); | ||
if (responseBody.text !== undefined) { | ||
// NOTE: We approximate the token count since this value is not directly available in the body | ||
// According to Bedrock docs they use (total_chars / 6) to approximate token count for pricing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prompt is only available in the JavaScript implementation because of special data model defined in an upstream Otel package. This makes it possible to approximate the input token usage from the response body.
However, this is not possible in the Java implementation as there is no special data model wrapping the inputs into the response body. As a result, I decided to move this approximation logic strictly to the request body to keep the implementation logic consistent between languages.
// According to Bedrock docs they use (total_chars / 6) to approximate token count for pricing. | ||
// https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prepare.html | ||
spanAttributes[AwsSpanProcessingUtil.GEN_AI_USAGE_INPUT_TOKENS] = Math.ceil(requestBody.message.length / 6); | ||
} | ||
} else if (modelId.includes('cohere.command')) { | ||
if (requestBody.max_tokens !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should go ahead and remove support for the old Cohere Command
model.
According to the docs, EOL should not be until 2025 but we are already getting 404s to this model.
ced18cb
to
c25d7de
Compare
@@ -265,7 +284,7 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |||
if (requestBody.top_p !== undefined) { | |||
spanAttributes[AwsSpanProcessingUtil.GEN_AI_REQUEST_TOP_P] = requestBody.top_p; | |||
} | |||
} else if (modelId.includes('mistral.mistral')) { | |||
} else if (modelId.includes('mistral')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We loosen this conditional because out of this list for mistral models, one of the model ids starts with mistral.mixtral
instead of mistral.mistral
. The request/response body syntax is still the same for both.
Description of changes:
Adding support for Cohere Command R models. The previous Cohere Command models are not yet fully deprecated (EOL April 2025) so we still include support for now.
Beginning 11/05/24 - Calls to old Cohere Command models now throw an exception for deprecation. I wasn't able to find any official announcement for this change, but I noticed it while testing during development in the Java SDK.
Interestingly, calls to the old model still return a response so the full gen ai attributes are still generated for the time being.
Test Plan:
Verified the attributes for the Command R model is being generated with sample app auto-instrumentation.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.