Replies: 1 comment
-
another possibility: Strawberry's reward predictors were trained using a multi-agent framework |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We're having a conversation that perhaps that the Strawberry system might right its own system prompts at instantiation due to some of the inconsistencies we've seen in leaks. It could also be that early leaks were a different version.
Beta Was this translation helpful? Give feedback.
All reactions