Merge pull request #6 from annahedstroem/bridge

Bridge
annahedstroem · Sep 4, 2023 · e59414e · e59414e
2 parents b58f58b + 0796e52
commit e59414e
Show file tree

Hide file tree

Showing 24 changed files with 1,191 additions and 8,841 deletions.
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -38,9 +38,10 @@ jobs:
         # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
         flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
         # run mypy
-        mypy quantus
+        # mypy metaquantus
         # run balck
-        black quantus
+        black metaquantus
     - name: Test with pytest
       run: |
+        export PYTHONPATH=${PYTHONPATH}:.
         pytest
diff --git a/.gitignore b/.gitignore
@@ -9,4 +9,5 @@ dist/
 *.egg-info/
 .coverage*
 coverage.xml
-venvs/
+venvs/
+*.ipynb_checkpoints/
diff --git a/README.md b/README.md
@@ -125,61 +125,61 @@ To reproduce the results of this paper, you will need to follow these three step
 
 1. **Generate the dataset.** Run the notebook [
 Tutorial-Data-Generation-Experiments.ipynb](https://github.com/annahedstroem/MetaQuantus/blob/main/tutorials/Tutorial-Data-Generation-Experiments.ipynb) to generate the necessary data for the experiments. This notebook will guide you through the process of downloading and preprocessing the data in order to save it to appropriate test sets. Please store the models in a folder called `assets/models/` and the tests sets under `assets/test_sets/`.
-2. **Run the experiments.** To obtain the results for the respective experiments, you have to run the respective Python scripts which are detailed below. All these Python files are located in the `scripts/` folder. If you want to run the experiments on other explanation methods, datasets or models, feel free to change the hyperparameters.
-3. **Analyse the results.** Once the results are obtained for your chosen experiments, run the [Tutorial-Reproduce-Paper-Experiments.ipynb](https://github.com/annahedstroem/MetaQuantus/blob/main/tutorials/Tutorial-Reproduce-Experiments.ipynb) to analyse the results. (In the notebook itself, we have also listed which specific Python scripts that need to be run in order to obtain the results for this analysis step.)
+2. **Run the experiments.** To obtain the results for the respective experiments, you have to run the respective Python experiments which are detailed below. All these Python files are located in the `experiments/` folder. If you want to run the experiments on other explanation methods, datasets or models, feel free to change the hyperparameters.
+3. **Analyse the results.** Once the results are obtained for your chosen experiments, run the [Tutorial-Reproduce-Paper-Experiments.ipynb](https://github.com/annahedstroem/MetaQuantus/blob/main/tutorials/Tutorial-Reproduce-Experiments.ipynb) to analyse the results. (In the notebook itself, we have also listed which specific Python experiments that need to be run in order to obtain the results for this analysis step.)
 
 <details>
 <summary><b><normal>Additional details on step 2 (Run the Experiments)</normal></b></summary>
 
 **Test**: Go to the root folder and run a simple test that meta-evaluation work.
 ```bash
-python3 scripts/run_test.py --K=5 --iters=10 --dataset=MNIST
+python3 experiments/run_test.py --K=5 --iters=10 --dataset=MNIST
 ```
 
 **Application**: Run the benchmarking experiments (also used for category convergence analysis).
 ```bash
-python3 scripts/run_benchmarking.py --dataset=MNIST --fname=f --K=5 --iters=3
-python3 scripts/run_benchmarking.py --dataset=fMNIST --fname=f --K=5 --iters=3
-python3 scripts/run_benchmarking.py --dataset=cMNIST --fname=f --K=5 --iters=3
-python3 scripts/run_benchmarking.py --dataset=ImageNet --fname=ResNet18 --K=5 --iters=3 --batch_size=50 --start_idx_fixed=100 --end_idx_fixed=150 --reverse_order=False --folder=benchmarks_imagenet/ --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking.py --dataset=MNIST --fname=f --K=5 --iters=3
+python3 experiments/run_benchmarking.py --dataset=fMNIST --fname=f --K=5 --iters=3
+python3 experiments/run_benchmarking.py --dataset=cMNIST --fname=f --K=5 --iters=3
+python3 experiments/run_benchmarking.py --dataset=ImageNet --fname=ResNet18 --K=5 --iters=3 --batch_size=50 --start_idx_fixed=100 --end_idx_fixed=150 --reverse_order=False --folder=benchmarks_imagenet/ --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 
 **Application**: Run hyperparameter optimisation experiment.
 ```bash
-python3 scripts/run_hp.py --dataset=MNIST --K=3 --iters=2 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_hp.py --dataset=ImageNet --K=3 --iters=2 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_hp.py --dataset=MNIST --K=3 --iters=2 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_hp.py --dataset=ImageNet --K=3 --iters=2 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 
 **Experiment**: Run the faithfulness ranking disagreement exercise.
 ```bash
-python3 scripts/run_ranking.py --dataset=cMNIST --fname=f --K=5 --iters=3 --category=Faithfulness --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_ranking.py --dataset=cMNIST --fname=f --K=5 --iters=3 --category=Faithfulness --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 
 **Sanity-Check**: Run sanity-checking exercise: adversarial estimators.
 ```bash
-python3 scripts/run_sanity_checks.py --dataset=ImageNet --K=3 --iters=2 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_sanity_checks.py --dataset=ImageNet --K=3 --iters=2 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 
 **Sanity-Check**: Run sanity-checking exercise: L dependency.
 ```bash
-python3 scripts/run_l_dependency.py --dataset=MNIST --K=5 --iters=3 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_l_dependency.py --dataset=fMNIST --K=5 --iters=3 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_l_dependency.py --dataset=cMNIST --K=5 --iters=3 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_l_dependency.py --dataset=MNIST --K=5 --iters=3 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_l_dependency.py --dataset=fMNIST --K=5 --iters=3 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_l_dependency.py --dataset=cMNIST --K=5 --iters=3 --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 
 **Benchmarking Transformers**: Run transformer benchmarking experiment.
 ```bash
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=0 --end_idx=40 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=40 --end_idx=80 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=80 --end_idx=120 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=120 --end_idx=160 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=0 --end_idx=40 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=40 --end_idx=80 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=80 --end_idx=120 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=120 --end_idx=160 --category=localisation --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 
 ```bash
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=40 --end_idx=80 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=0 --end_idx=40 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=80 --end_idx=120 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
-python3 scripts/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=120 --end_idx=160 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=40 --end_idx=80 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=0 --end_idx=40 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=80 --end_idx=120 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
+python3 experiments/run_benchmarking_transformers.py --dataset=ImageNet --K=5 --iters=3 --start_idx=120 --end_idx=160 --category=complexity --PATH_ASSETS=../assets/ --PATH_RESULTS=results/
 ```
 </details>
 

diff --git a/experiments/experiment_kwargs/bridge_estimators_101.ini b/experiments/experiment_kwargs/bridge_estimators_101.ini
@@ -0,0 +1,18 @@
+[DEFAULT]
+perturbation_levels=None
+nr_levels=5
+nr_samples=10
+x_noise=0.01
+abs=False
+normalise=True
+similarity_func=quantus.similarity_func.correlation_spearman
+measure_func=quantus.similarity_func.squared_difference
+normalise_func=quantus.normalise_func.normalise_by_average_second_moment_estimate
+return_aggregate=False
+disable_warnings=True
+display_progressbar=False
+xai_settings = {"MNIST": ["Saliency", "InputXGradient", "LayerGradCam", "GradientShap"],
+               "fMNIST": ["Saliency", "InputXGradient", "LayerGradCam", "GradientShap"],
+               "cMNIST": ["Gradient", "InputXGradient", "LayerGradCam"],
+               "ImageNet": ["Saliency", "InputXGradient", "GradientShap"],}
+std_max = {"MNIST": 2.0, "fMNIST": 2.0, "cMNIST": 0.75, "ImageNet": 0.5}
diff --git a/experiments/experiment_kwargs/bridge_estimators_101_hp.ini b/experiments/experiment_kwargs/bridge_estimators_101_hp.ini
@@ -0,0 +1,12 @@
+[DEFAULT]
+nr_models = [1, 5, 10]
+nr_levels = [2, 5, 10, 20]
+dist_funcs = {
+    "sq": quantus.similarity_func.squared_difference,
+    "cos": quantus.similarity_func.cosine,
+    "euc": quantus.similarity_func.distance_euclidean,
+}
+simi_funcs = {
+    "pear": quantus.similarity_func.correlation_pearson,
+    "spear": quantus.similarity_func.correlation_spearman,
+}
diff --git a/scripts/run_benchmarking.py → experiments/run_benchmarking.py b/scripts/run_benchmarking.py → experiments/run_benchmarking.py
@@ -50,24 +50,35 @@
     start_idx_fixed = eval(args.start_idx_fixed)
     PATH_ASSETS = str(args.PATH_ASSETS)
     PATH_RESULTS = str(args.PATH_RESULTS)
-    print(dataset_name, K, iters, batch_size, fname, reverse_order, folder, start_idx_fixed, end_idx_fixed, PATH_ASSETS, PATH_RESULTS)
+    print(
+        dataset_name,
+        K,
+        iters,
+        batch_size,
+        fname,
+        reverse_order,
+        folder,
+        start_idx_fixed,
+        end_idx_fixed,
+        PATH_ASSETS,
+        PATH_RESULTS,
+    )
 
     #########
     # GPUs. #
     #########
 
     # Setting device on GPU if available, else CPU.
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    print("Using device:", device)
-    print()
-    print(torch.version.cuda)
+    print("\nUsing device:", device)
+    print("\t{torch.version.cuda}")
 
     # Additional info when using cuda.
     if device.type == "cuda":
-        print(torch.cuda.get_device_name(0))
-        print("Memory Usage:")
-        print("Allocated:", round(torch.cuda.memory_allocated(0) / 1024 ** 3, 1), "GB")
-        print("Cached:   ", round(torch.cuda.memory_cached(0) / 1024 ** 3, 1), "GB")
+        print(f"\t{torch.cuda.get_device_name(0)}")
+        print("\tMemory Usage:")
+        print("\tAllocated:", round(torch.cuda.memory_allocated(0) / 1024 ** 3, 1), "GB")
+        print("\tCached:   ", round(torch.cuda.memory_cached(0) / 1024 ** 3, 1), "GB")
 
     # Reduce the number of explanation methods and samples for ImageNet.
     if dataset_name == "ImageNet":
@@ -83,19 +94,19 @@
         dataset_name=dataset_name, path_assets=PATH_ASSETS, device=device
     )
     dataset_settings = {dataset_name: SETTINGS[dataset_name]}
-    dataset_kwargs = dataset_settings[dataset_name]["estimator_kwargs"]
+    estimator_kwargs = dataset_settings[dataset_name]["estimator_kwargs"]
 
     # Get analyser suite.
     analyser_suite = setup_test_suite(dataset_name=dataset_name)
 
     # Get estimators.
     estimators = setup_estimators(
-        features=dataset_kwargs["features"],
-        num_classes=dataset_kwargs["num_classes"],
-        img_size=dataset_kwargs["img_size"],
-        percentage=dataset_kwargs["percentage"],
-        patch_size=dataset_kwargs["patch_size"],
-        perturb_baseline=dataset_kwargs["perturb_baseline"],
+        features=estimator_kwargs["features"],
+        num_classes=estimator_kwargs["num_classes"],
+        img_size=estimator_kwargs["img_size"],
+        percentage=estimator_kwargs["percentage"],
+        patch_size=estimator_kwargs["patch_size"],
+        perturb_baseline=estimator_kwargs["perturb_baseline"],
     )
 
     estimators_sub = {
@@ -109,8 +120,8 @@
     # Get explanation methods.
     xai_methods = setup_xai_methods(
         gc_layer=dataset_settings[dataset_name]["gc_layers"][model_name],
-        img_size=dataset_kwargs["img_size"],
-        nr_channels=dataset_kwargs["nr_channels"],
+        img_size=estimator_kwargs["img_size"],
+        nr_channels=estimator_kwargs["nr_channels"],
     )
 
     ###########################
@@ -156,8 +167,9 @@
             }
         elif fname == "Deit":
             dataset_settings[dataset_name]["models"] = {
-                "Deit": timm.create_model(model_name='deit_tiny_distilled_patch16_224',
-                                      pretrained=True),
+                "Deit": timm.create_model(
+                    model_name="deit_tiny_distilled_patch16_224", pretrained=True
+                ),
             }
 
         # Prepare batching.
@@ -172,7 +184,7 @@
 
             # Get indicies.
             end_idx = min(int(start_idx + batch_size), nr_samples)
-            if (end_idx-start_idx) < batch_size:
+            if (end_idx - start_idx) < batch_size:
                 continue
 
             if end_idx_fixed:
@@ -192,9 +204,15 @@
             )
 
             # Reduce the number of samples.
-            dataset_settings[dataset_name]["x_batch"] = dataset_settings[dataset_name]["x_batch"][start_idx:end_idx]
-            dataset_settings[dataset_name]["y_batch"] = dataset_settings[dataset_name]["y_batch"][start_idx:end_idx]
-            dataset_settings[dataset_name]["s_batch"] = dataset_settings[dataset_name]["s_batch"][start_idx:end_idx]
+            dataset_settings[dataset_name]["x_batch"] = dataset_settings[dataset_name][
+                "x_batch"
+            ][start_idx:end_idx]
+            dataset_settings[dataset_name]["y_batch"] = dataset_settings[dataset_name][
+                "y_batch"
+            ][start_idx:end_idx]
+            dataset_settings[dataset_name]["s_batch"] = dataset_settings[dataset_name][
+                "s_batch"
+            ][start_idx:end_idx]
 
             # Benchmark!
             benchmark = MetaEvaluationBenchmarking(
@@ -212,5 +230,3 @@
 
             if start_idx_fixed is not None:
                 break
-
-