268 enable configuration of the llm evalutation defence in sandbox #336

pmarsh-scottlogic · 2023-09-29T13:42:41Z

completes #268

changes

This PR works to make the instructions for the evaluation LLM cofigurable in the sandbox. See the screenshot

renames the defence type LLM_EVALUATION to EVALUATION_LLM_INSTRUCTIONS
adds defence configs to the DefenceInfo objects for EVALUATION_LLM_INSTRUCTIONS
populates prompts with default values from the templates
renames some methods and constants to disambiguate language

Language diambiguation

We have the Evaluation LLM as part of our langchain architecture
The evaluation LLM is actually made of two separate models, which I call prompt injection eval and malicious prompt eval
Each of these models, when initialised in langchain take a prompt template. This is like a prompt, but with placeholders which then become replaced dynamically with concrete values by langchain at runtime, to make a prompt value
The prompt value is the concrete prompt with no placeholders that is given the the model.
A prompt template is made up of a pre prompt prepended to a main prompt.
The pre prompt is what is configurable by the user. It is what tell the model how to behave.
The main prompt contains the actual question asked to the LLM, with placeholder values. See promptTemplates.ts for examples

concerns with the PR

the text box that holds the prompt is necessarily quite big. I wonder if we should add a scroll bar past a threshold
I wonder if the names I came up with are too much of a mouthful

a potential helpful refactor for a different ticket

I wonder if we should refactor the Defence Config ids so that they are a shared enum or type across the front and backend rather than arbitrary strings. It didn't cause my any trouble yet but it strikes me as being vunerable to easy-to-make, hard-to-debug mistakes.

…TRUCTIONS

…end, which allows the inputs to appear in the defence panel

…th default values

…ompt

…ctionEvalPrePrompt

…edDefences to init model

…riable first

…n mainPrompt

…t and main prompt for model init

pmarsh-scottlogic · 2023-10-03T15:52:53Z

Happy to undo those name changes btw if y'all disagree with them. I mostly did it for my own benefit

…on-defence-in-sandbox

gsproston-scottlogic

Looks good to me! Just a couple of things to tweak.

backend/src/defence.ts

backend/src/openai.ts

chriswilty

A few suggestions, and some other general comments on the code where we are making life a bit difficult for ourselves. I will add a few new issues to cover the latter.

One final comment - our separation of concerns in the back-end is a bit fuzzy, and it's making it difficult to see what each type's responsibility is. I'll add an issue to look into that as well.

backend/test/unit/defence.test.ts

backend/src/openai.ts

backend/src/langchain.ts

backend/test/unit/langchain.test.ts

frontend/src/Defences.ts

frontend/src/models/defence.ts

backend/src/defence.ts

gsproston-scottlogic

Nice work 👍

chriswilty · 2023-10-13T09:28:07Z

@pmarsh-scottlogic Pushed some minor code cleanup, and solved the jest problem. This one's ready to go.

chriswilty · 2023-10-13T10:38:10Z

I've updated the UI to preserve whitespace in the defence prompt boxes, which makes it more readable:

…on-defence-in-sandbox

) --------- Co-authored-by: Chris Wilton-Magras <[email protected]>

pmarsh-scottlogic added 7 commits September 28, 2023 15:50

renames defence trypr LLM_EVALUATION to EVALUATION_LLM_INSTRUCTIONS

a6c0cfc

correct grammar in promptInjectionEvalTemplate

5a323c3

renames defence type in frontend LLM_EVALUATION to EVALUATION_LLM_INS…

8cd6503

…TRUCTIONS

renames to EVALUATION_LLM_INSTRUTIONS on the defences panel

05f2f1f

renames to Evaluation LLM instructions on the defences panel

2cbb7b4

Adds defence configs to Evaluation llm instructions defenceInfo front…

a678573

…end, which allows the inputs to appear in the defence panel

Adds defence configs to Evaluation_LLM_EVALUATIONS in the backend, wi…

2e06666

…th default values

pmarsh-scottlogic closed this Sep 29, 2023

pmarsh-scottlogic added 15 commits September 29, 2023 16:28

Separates promptInjectionEvalTemplate into the template and the prepr…

0d59556

…ompt

makes templates file export promptInjectionEvalPrePrompt

37a986e

renames qaPrompt to qaPrePrompt

5661b1d

prompt injection evaluator preprompt now taken from the session

78b83a0

renames qAcontextTemplate to qAMainPrompt

3c9228c

renames retrievalQAPrePrompt to qAPrePrompt

a6f753e

renames retrievalQAPrePromptSecure to qAPrePromptSecure

a3424d3

renames promptInjectionEvalTemplate to promptInjectionEvalMainPrompt

16963c9

renames generic prePrompt parameter in evaluation model to promptInje…

254a920

…ctionEvalPrePrompt

passes maliciousPromptEvalPrePrompt down the chain from detectTrigger…

e843149

…edDefences to init model

sets promptInjectionEvalTemplate directly rather than setting to a va…

f10a657

…riable first

sets promptInjectionEvalPrePrompt as default session value rather tha…

9711071

…n mainPrompt

splits malicious prompt eval into prePrompt and mainPrompt

043c576

constructs the malicious prompt eval template from the given prepromp…

874fd82

…t and main prompt for model init

adds method to get malicious prompt eval pre prompt from session storage

7fda12a

pmarsh-scottlogic reopened this Oct 2, 2023

exports getMaliciousPromptEvalPrePrompt from defence.ts

a693ea1

pmarsh-scottlogic closed this Oct 2, 2023

pmarsh-scottlogic added 4 commits October 2, 2023 11:54

fixes eval llm initialisation in langchain unit tests

1cd5acc

fixes eval llm initialisation in defences integration tests

d47ec08

fix instantiation of eval LLM in langchain integration tests

fa3f7b5

adds generic function for making the prompt template

77293af

pmarsh-scottlogic requested review from heatherlogan-scottlogic, gsproston-scottlogic and chriswilty October 3, 2023 15:53

Merge branch 'dev' into 268-enable-configuration-of-the-llm-evalutati…

8d28e33

…on-defence-in-sandbox

gsproston-scottlogic linked an issue Oct 4, 2023 that may be closed by this pull request

Enable configuration of the LLM Evalutation defence in sandbox #268

Closed

gsproston-scottlogic suggested changes Oct 4, 2023

View reviewed changes

backend/src/defence.ts Outdated Show resolved Hide resolved

backend/src/openai.ts Outdated Show resolved Hide resolved

removes unnecessarry import as

12e2c15

pmarsh-scottlogic marked this pull request as draft October 4, 2023 10:33

pmarsh-scottlogic added 2 commits October 4, 2023 11:55

always gets eval LLM instructions from config

82b3736

adds a new line between pre prompt and main prompt for prompt template

57f45ed

chriswilty requested changes Oct 4, 2023

View reviewed changes

fixes tests which were broken by the new line

79b90c7

pmarsh-scottlogic marked this pull request as ready for review October 4, 2023 13:55

gsproston-scottlogic approved these changes Oct 6, 2023

View reviewed changes

chriswilty mentioned this pull request Oct 6, 2023

Separation of concerns in back-end code #372

Closed

chriswilty force-pushed the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch 2 times, most recently from 6ddaade to f7ea245 Compare October 13, 2023 09:23

Fix jest mock issue, typo, other warnings

614902e

chriswilty force-pushed the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch from f7ea245 to 614902e Compare October 13, 2023 09:27

chriswilty approved these changes Oct 13, 2023

View reviewed changes

Correct dom access, preserve whitespace in preprompts

22fa317

chriswilty force-pushed the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch from 0c1f205 to 22fa317 Compare October 13, 2023 10:31

Merge branch 'dev' into 268-enable-configuration-of-the-llm-evalutati…

e7df174

…on-defence-in-sandbox

chriswilty self-assigned this Oct 13, 2023

chriswilty merged commit 480c212 into dev Oct 13, 2023
2 checks passed

chriswilty deleted the 268-enable-configuration-of-the-llm-evalutation-defence-in-sandbox branch October 13, 2023 11:00

chriswilty added a commit that referenced this pull request Apr 8, 2024

268 enable configuration of the llm evalutation defence in sandbox (#336

e6d7742

) --------- Co-authored-by: Chris Wilton-Magras <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

268 enable configuration of the llm evalutation defence in sandbox #336

268 enable configuration of the llm evalutation defence in sandbox #336

pmarsh-scottlogic commented Sep 29, 2023 •

edited

Loading

pmarsh-scottlogic commented Oct 3, 2023

gsproston-scottlogic left a comment

chriswilty left a comment

gsproston-scottlogic left a comment

chriswilty commented Oct 13, 2023

chriswilty commented Oct 13, 2023

268 enable configuration of the llm evalutation defence in sandbox #336

268 enable configuration of the llm evalutation defence in sandbox #336

Conversation

pmarsh-scottlogic commented Sep 29, 2023 • edited Loading

changes

Language diambiguation

concerns with the PR

a potential helpful refactor for a different ticket

pmarsh-scottlogic commented Oct 3, 2023

gsproston-scottlogic left a comment

Choose a reason for hiding this comment

chriswilty left a comment

Choose a reason for hiding this comment

gsproston-scottlogic left a comment

Choose a reason for hiding this comment

chriswilty commented Oct 13, 2023

chriswilty commented Oct 13, 2023

pmarsh-scottlogic commented Sep 29, 2023 •

edited

Loading