New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Feature/autoeval #1043

Closed

erickgalinkin wants to merge 2 commits into NVIDIA:main from erickgalinkin:feature/autoeval

Collaborator

erickgalinkin commented Dec 13, 2024

Partial fulfillment of #984

erickgalinkin added 2 commits

December 13, 2024 13:52


          Initial skeleton for autoeval/deep analysis.

439100d


          Add jailbreak.json containing jailbreak feedback; update analytics.py…

a85b57b

… with better cases and include format strings; update tier_1 analysis in deep_analysis.py to include jailbreak feedback from analytics.

leondz reviewed

View reviewed changes

Collaborator

leondz left a comment

OK maybe my bad for leaving the issue underspecified

For me the first part of this task is to prepare the artefacts used in qualitative review. The second is to select some pieces of text for suggestion in a model card.

I would prefer to amend this so that we get these features:

Read an eval report.jsonl and identify failing scores, based on tier, absolute score, and calibration z-score
Create a sheet of samples for qualitative analysis, where from each failing probe, a random selection of ten (or n) prompt:output pairs is given. This sheet would have four columns - probe, detector, prompt, output.

Let's take a chat elsewhere re: target workflow

garak/analyze/deep_analysis/analytics.py

		from garak.data import path as data_path


		TIER_1_PROBE_GROUPS = {

Collaborator

leondz Dec 16, 2024

Grouping brings simplification to reporting. Can we get some guidelines on how the groups are defined, so that the following future questions can be answered:

"What do these reporting groups mean?"
"Which group do I add this new probe to?"

garak/analyze/deep_analysis/analytics.py


		TIER_1_PROBES = list(set().union(TIER_1_PROBE_GROUPS.values()))

		TIER_2_PROBE_GROUPS = {

Collaborator

leondz Dec 16, 2024

Tier and group seem orthogonal information, can they be stored in separate data structures?

garak/analyze/deep_analysis/analytics.py



		@lru_cache
		def load_calibration_data(calibration_filename=CALIBRATION_DATA):

Collaborator

leondz Dec 16, 2024

garak.analyze.calibration does some of this processing already, may be a good place to concentrate logic for this kind of thing

garak/analyze/deep_analysis/analytics.py

Comment on lines +150 to +159

+                      minus_sigma_scores = [
+                          probe_data[probe]["mu"] - probe_data[probe]["sigma"]
+                          for probe in v
+                          if probe in probe_data.keys()
+                      ]
+                      plus_sigma_scores = [
+                          probe_data[probe]["mu"] + probe_data[probe]["sigma"]
+                          for probe in v
+                          if probe in probe_data.keys()
+                      ]

Collaborator

leondz Dec 16, 2024

what do these represent? the ±1 s.d. bounds of an average model's score for a given probe:detector pair?

garak/analyze/deep_analysis/analytics.py

Comment on lines +171 to +179

+              def load_feedback(feedback_type: str) -> dict:
+                  match feedback_type:
+                      case "jailbreak":
+                          with open(JAILBREAK_FEEDBACK_LOCATION, "r", encoding="utf-8") as f:
+                              feedback = json.loads(f.read().strip())
+                      case _:
+                          feedback = dict()
+                  return feedback

Collaborator

leondz Dec 16, 2024

can we use feedback files named according to a group and just skip if file not found, allowing us to drop the case statements?

garak/analyze/deep_analysis/analytics.py



		def daninthewild_feedback(results: dict) -> str:
		# Currently returns nothing, as we do not have a bag written.

Collaborator

leondz Dec 16, 2024

from garak/garak/data/calibration/calibration.json:

    "dan.DanInTheWildMini/mitigation.MitigationBypass": {
      "mu": 0.38435897435897437,
      "sigma": 0.12534659378422308,
      "sw_p": 0.060981681027192626
    },

garak/analyze/deep_analysis/deep_analysis.py

+              def deep_analysis(report_path, bag_path=ANALYSIS_FILE) -> Tuple[str, str]:
+                  """
+                  Take garak report jsonl file and perform qualitative analysis on the probe results for the target.

Collaborator

leondz Dec 16, 2024

perform qualitative analysis

It's all quantitative right? We compare quantities and choose blocks of text, no human in the loop, no qualitative method

erickgalinkin closed this

erickgalinkin deleted the feature/autoeval branch

December 16, 2024 16:13

github-actions bot locked and limited conversation to collaborators

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet