Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

268 enable configuration of the llm evalutation defence in sandbox #336

Conversation

pmarsh-scottlogic
Copy link
Contributor

@pmarsh-scottlogic pmarsh-scottlogic commented Sep 29, 2023

completes #268

changes

This PR works to make the instructions for the evaluation LLM cofigurable in the sandbox. See the screenshot

  • renames the defence type LLM_EVALUATION to EVALUATION_LLM_INSTRUCTIONS
  • adds defence configs to the DefenceInfo objects for EVALUATION_LLM_INSTRUCTIONS
  • populates prompts with default values from the templates
  • renames some methods and constants to disambiguate language
    image

Language diambiguation

  • We have the Evaluation LLM as part of our langchain architecture
  • The evaluation LLM is actually made of two separate models, which I call prompt injection eval and malicious prompt eval
  • Each of these models, when initialised in langchain take a prompt template. This is like a prompt, but with placeholders which then become replaced dynamically with concrete values by langchain at runtime, to make a prompt value
  • The prompt value is the concrete prompt with no placeholders that is given the the model.
  • A prompt template is made up of a pre prompt prepended to a main prompt.
  • The pre prompt is what is configurable by the user. It is what tell the model how to behave.
  • The main prompt contains the actual question asked to the LLM, with placeholder values. See promptTemplates.ts for examples

concerns with the PR

  • the text box that holds the prompt is necessarily quite big. I wonder if we should add a scroll bar past a threshold
  • I wonder if the names I came up with are too much of a mouthful

a potential helpful refactor for a different ticket

  • I wonder if we should refactor the Defence Config ids so that they are a shared enum or type across the front and backend rather than arbitrary strings. It didn't cause my any trouble yet but it strikes me as being vunerable to easy-to-make, hard-to-debug mistakes.

@pmarsh-scottlogic
Copy link
Contributor Author

Happy to undo those name changes btw if y'all disagree with them. I mostly did it for my own benefit

Copy link
Contributor

@gsproston-scottlogic gsproston-scottlogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just a couple of things to tweak.

backend/src/defence.ts Outdated Show resolved Hide resolved
backend/src/openai.ts Outdated Show resolved Hide resolved
@pmarsh-scottlogic pmarsh-scottlogic marked this pull request as draft October 4, 2023 10:33
Copy link
Member

@chriswilty chriswilty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, and some other general comments on the code where we are making life a bit difficult for ourselves. I will add a few new issues to cover the latter.

One final comment - our separation of concerns in the back-end is a bit fuzzy, and it's making it difficult to see what each type's responsibility is. I'll add an issue to look into that as well.

backend/test/unit/defence.test.ts Show resolved Hide resolved
backend/test/unit/defence.test.ts Show resolved Hide resolved
backend/src/openai.ts Outdated Show resolved Hide resolved
backend/src/openai.ts Show resolved Hide resolved
backend/src/langchain.ts Outdated Show resolved Hide resolved
backend/test/unit/langchain.test.ts Outdated Show resolved Hide resolved
backend/test/unit/langchain.test.ts Show resolved Hide resolved
frontend/src/Defences.ts Show resolved Hide resolved
frontend/src/models/defence.ts Show resolved Hide resolved
backend/src/defence.ts Show resolved Hide resolved
@pmarsh-scottlogic pmarsh-scottlogic marked this pull request as ready for review October 4, 2023 13:55
Copy link
Contributor

@gsproston-scottlogic gsproston-scottlogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍

@chriswilty chriswilty force-pushed the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch 2 times, most recently from 6ddaade to f7ea245 Compare October 13, 2023 09:23
@chriswilty chriswilty force-pushed the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch from f7ea245 to 614902e Compare October 13, 2023 09:27
@chriswilty
Copy link
Member

@pmarsh-scottlogic Pushed some minor code cleanup, and solved the jest problem. This one's ready to go.

@chriswilty chriswilty force-pushed the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch from 0c1f205 to 22fa317 Compare October 13, 2023 10:31
@chriswilty
Copy link
Member

I've updated the UI to preserve whitespace in the defence prompt boxes, which makes it more readable:

image

@chriswilty chriswilty self-assigned this Oct 13, 2023
@chriswilty chriswilty merged commit 480c212 into dev Oct 13, 2023
2 checks passed
@chriswilty chriswilty deleted the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch October 13, 2023 11:00
chriswilty added a commit that referenced this pull request Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable configuration of the LLM Evalutation defence in sandbox
3 participants