Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/autoeval #1043

Closed
wants to merge 2 commits into from
Closed

Conversation

erickgalinkin
Copy link
Collaborator

Partial fulfillment of #984

… with better cases and include format strings; update tier_1 analysis in deep_analysis.py to include jailbreak feedback from analytics.
Copy link
Collaborator

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK maybe my bad for leaving the issue underspecified

For me the first part of this task is to prepare the artefacts used in qualitative review. The second is to select some pieces of text for suggestion in a model card.

I would prefer to amend this so that we get these features:

  1. Read an eval report.jsonl and identify failing scores, based on tier, absolute score, and calibration z-score
  2. Create a sheet of samples for qualitative analysis, where from each failing probe, a random selection of ten (or n) prompt:output pairs is given. This sheet would have four columns - probe, detector, prompt, output.

Let's take a chat elsewhere re: target workflow

from garak.data import path as data_path


TIER_1_PROBE_GROUPS = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping brings simplification to reporting. Can we get some guidelines on how the groups are defined, so that the following future questions can be answered:

  1. "What do these reporting groups mean?"
  2. "Which group do I add this new probe to?"


TIER_1_PROBES = list(set().union(TIER_1_PROBE_GROUPS.values()))

TIER_2_PROBE_GROUPS = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tier and group seem orthogonal information, can they be stored in separate data structures?



@lru_cache
def load_calibration_data(calibration_filename=CALIBRATION_DATA):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garak.analyze.calibration does some of this processing already, may be a good place to concentrate logic for this kind of thing

Comment on lines +150 to +159
minus_sigma_scores = [
probe_data[probe]["mu"] - probe_data[probe]["sigma"]
for probe in v
if probe in probe_data.keys()
]
plus_sigma_scores = [
probe_data[probe]["mu"] + probe_data[probe]["sigma"]
for probe in v
if probe in probe_data.keys()
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do these represent? the ±1 s.d. bounds of an average model's score for a given probe:detector pair?

Comment on lines +171 to +179
def load_feedback(feedback_type: str) -> dict:
match feedback_type:
case "jailbreak":
with open(JAILBREAK_FEEDBACK_LOCATION, "r", encoding="utf-8") as f:
feedback = json.loads(f.read().strip())

case _:
feedback = dict()
return feedback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use feedback files named according to a group and just skip if file not found, allowing us to drop the case statements?



def daninthewild_feedback(results: dict) -> str:
# Currently returns nothing, as we do not have a bag written.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from garak/garak/data/calibration/calibration.json:

    "dan.DanInTheWildMini/mitigation.MitigationBypass": {
      "mu": 0.38435897435897437,
      "sigma": 0.12534659378422308,
      "sw_p": 0.060981681027192626
    },


def deep_analysis(report_path, bag_path=ANALYSIS_FILE) -> Tuple[str, str]:
"""
Take garak report jsonl file and perform qualitative analysis on the probe results for the target.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perform qualitative analysis

It's all quantitative right? We compare quantities and choose blocks of text, no human in the loop, no qualitative method

@erickgalinkin erickgalinkin deleted the feature/autoeval branch December 16, 2024 16:13
@github-actions github-actions bot locked and limited conversation to collaborators Dec 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants