instadeepai · RuanJohn · Feb 27, 2024 · Feb 26, 2024 · Feb 26, 2024 · Feb 26, 2024
diff --git a/README.md b/README.md
@@ -84,6 +84,7 @@ For a more detailed example illustrating how multiple plots may be made for vari
 
 In order to use the tools, raw experiment data must be in the suggested format and stored in a json file. If given in the correct format, `marl-eval` will aggregate experiment data, plot the results and produce aggregated tabular results as a `.csv` file, in LaTeX table formatting and in the terminal.
 
+<a id="exp_structure"></a>
 ### Data Structure for Raw Experiment data 📒
 
 In order to use the tools we suggest effectively, raw data json files are required to have the following structure :
@@ -150,13 +151,18 @@ Here `run_1` to `run_n` correspond to the number of independent runs in a given
 >
 > For producing probability of improvement plots, it is important that any algorithm names in the dataset do not contain any commas.
 
-### Data Tooling
-[**Pull Neptune Data**](marl_eval/json_tools/pull_neptune_data.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune.
+### JSON Data Tooling
 
-[**JSON Files Merging Script**](marl_eval/json_tools/merge_json_files.py): `concatenate_files` reads multiple json files from a specified local directory and concatenates their contents into a single structured dictionary, while ensuring uniqueness of seed numbers within the data. It handles nested json structures and saves the concatenated result into a new single json file for downstream aggregation and plotting.
+[**JSON Logger**](marl_eval/json_tools/json_logger.py): `JsonLogger` handles logging data according to the structured format detailed [above](#exp_structure).
+
+[**Neptune Data Pulling Script**](marl_eval/json_tools/json_utils.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune.
+
+[**JSON File Merging Script**](marl_eval/json_tools/json_utils.py): `concatenate_json_files` reads multiple JSON files from a specified local directory and concatenates their contents into a single structured JSON file.
 
 > 📌 Using `pull_neptune_data` followed by `concatenate_files` forms an effective workflow, where multiple JSON files from different experiment runs are first pulled from Neptune and then merged into a single file, ready for use in marl-eval.
 
+For more details on how to use the JSON tools, please see the [detailed usage guide](docs/json_tooling_usage.md).
+
 ### Metrics to be normalised during data processing ⚗️
 Certain metrics, like episode returns, are required to be normalised during data processing. In order to achieve this it is required that users give these metric names, in the form of strings in a python list, to the `data_process_pipeline` function, the `create_matrices_for_rliable` function and all plotting functions as an argument. In the case where no normalisation is required this argument may be omitted.
 

diff --git a/docs/json_tooling_usage.md b/docs/json_tooling_usage.md
@@ -0,0 +1,94 @@
+# JSON tooling usage guide
+
+## JSON logger
+
+The JSON logger will write experiment data to JSON files in the format required for downstream aggregation and plotting with the MARL-eval tools. To initialise the logger the following arguments are required:
+
+* `path`: the path where a file called `metrics.json` will be stored which will contain all logged metrics for a given experiment. Data will be stored in `<path>/metrics.json` by default. If a JSON file already exists at a particular path, new experiment data will be appended to it. MARL-eval does currently **NOT SUPPORT** asynchronous logging. So if you intend to run distributed experiments, please create a unique `path` per experiment and concatenate all generated JSON files after all experiments have been run.
+* `algorithm_name`: the name of the algorithm being run in the current experiment.
+* `task_name`: the name of the task in the current experiment.
+* `environment_name`: the name of the environment in the current experiment.
+* `seed`: the integer value of the seed used for pseudo-randomness in the current experiment.
+
+An example of initialising the JSON logger could look something like:
+
+```python
+from marl_eval.json_tools import JsonLogger
+
+json_logger = JsonLogger(
+    path="experiment_results",
+    algorithm_name="IPPO",
+    task_name="2s3z",
+    environment_name="SMAX",
+    seed=42,
+)
+```
+
+To write data to the logger, the `write` method takes in the following arguments:
+
+* `timestep`: the current environment timestep at the time of evaluation.
+* `key`: the name of the metric to be logged.
+* `value`: the scalar value to be logged for the current metric.
+* `evaluation_step`: the number of evaluations that have been performed so far.
+* `is_absolute_metric`: a boolean flag indicating whether an absolute metric is being logged.
+
+Suppose a the `4`th evaluation is being performed at environment timestep `40000` for the `episode_return` metric with a value of `12.9` then the `write` method could be used as follows:
+
+```python
+json_logger.write(
+    timestep=40_000,
+    key="episode_return",
+    value=12.9,
+    evaluation_step=4,
+    is_absolute_metric=False,
+)
+```
+
+In the case where the absolute metric for the `win_rate` metric with a value of `85.3` is logged at the `200`th evaluation after `2_000_000` timesteps, the `write` method would be called as follows:
+
+```python
+json_logger.write(
+    timestep=2_000_000,
+    key="win_rate",
+    value=85.3,
+    evaluation_step=200,
+    is_absolute_metric=True,
+)
+```
+
+## Neptune data pulling script
+The `pull_neptune_data` script will download JSON data for multiple experiment runs from Neptune given a list of one or more Neptune experiment tags. The function accepts the following arguments:
+
+* `project_name`: the name of the neptune project where data has been logged given as `<workspace_name>/<project_name>`.
+* `tag`: a list of Neptune experiment tags for which JSON data should be downloaded.
+* `store_directory`: a local directory where downloaded JSON files should be stored.
+* `neptune_data_key`: a key in a particular Neptune run where JSON data has been stored. By default this while be `metrics` implying that the JSON file will be stored as `metrics/<metric_file_name>.zip` in a given Neptune run. For an example of how data is uploaded please see [here](https://github.com/instadeepai/Mava/blob/ce9a161a0b293549b2a34cd9a8d794ba7e0c9949/mava/utils/logger.py#L182).
+
+In onrder to download data, the tool can be used as follows:
+
+```python
+from marl_eval.json_tools import pull_netpune_data
+
+pull_netpune_data(
+    project_name="DemoWorkspace/demo_project",
+    tag=["experiment_1"],
+    store_directory="./neptune_json_data",
+)
+```
+
+## JSON file merging script
+The `concatenate_json_files` function will merge all JSON files found in a given directory into a single JSON file ready to be used for downstream aggregation and plotting with MARL-eval. The function accepts the following arguments:
+
+* `input_directory`: the path to the directory containing multiple JSON files. This directory can contain JSON files in arbitrarily nested directories.
+* `output_json_path`: the path where the merged JSON file should be stored.
+
+The function can be used as follows:
+
+```python
+from marl_eval.json_tools import concatenate_json_files
+
+concatenate_json_files(
+    input_directory="path/to/some/folder/",
+    output_json_path="path/to/merged_file/folder/",
+)
+```
diff --git a/marl_eval/json_tools/__init__.py b/marl_eval/json_tools/__init__.py
@@ -15,3 +15,4 @@
 
 """JSON tools for data preprocessing."""
 from .json_logger import JsonLogger
+from .json_utils import concatenate_json_files, pull_neptune_data
diff --git a/marl_eval/json_tools/json_logger.py b/marl_eval/json_tools/json_logger.py
@@ -83,7 +83,7 @@ def write(
         Args:
             timestep (int): the current environment timestep.
             key (str): the name of the metric to be logged.
-            value (str): the value of the metric to be logged.
+            value (float): the value of the metric to be logged.
             evaluation_step (int): the number of evaluations already run.
             is_absolute_metric (bool): whether the metric being logged is
                 an absolute metric.

diff --git a/marl_eval/json_tools/merge_json_files.py → marl_eval/json_tools/json_utils.py b/marl_eval/json_tools/merge_json_files.py → marl_eval/json_tools/json_utils.py
@@ -15,10 +15,13 @@
 
 import json
 import os
+import zipfile
 from collections import defaultdict
-from typing import Dict, Tuple
+from typing import Dict, List, Tuple
 
+import neptune
 from colorama import Fore, Style
+from tqdm import tqdm
 
 
 def _read_json_files(directory: str) -> list:
@@ -62,7 +65,7 @@ def _check_seed(concatenated_data: Dict, algo_data: Dict, seed_number: str) -> s
         return seed_number
 
 
-def concatenate_files(
+def concatenate_json_files(
     input_directory: str, output_json_path: str = "concatenated_json_files/"
 ) -> Dict:
     """Concatenate all json files in a directory and save the result in a json file."""
@@ -104,3 +107,52 @@ def concatenate_files(
         + f"{output_json_path}metrics.json successfully!{Style.RESET_ALL}"
     )
     return concatenated_data
+
+
+def pull_neptune_data(
+    project_name: str,
+    tag: List,
+    store_directory: str = "./downloaded_json_data",
+    neptune_data_key: str = "metrics",
+) -> None:
+    """Pulls experiment json data from Neptune to a local directory.
+
+    Args:
+        project_name (str): Name of the Neptune project.
+        tag (List): List of tags.
+        store_directory (str, optional): Directory to store the data.
+        neptune_data_key (str): Key in the neptune run where the json data is stored.
+    """
+    # Get the run ids
+    project = neptune.init_project(project=project_name)
+    runs_table_df = project.fetch_runs_table(state="inactive", tag=tag).to_pandas()
+    run_ids = runs_table_df["sys/id"].values.tolist()
+
+    # Check if store_directory exists
+    if not os.path.exists(store_directory):
+        os.makedirs(store_directory)
+
+    # Download and unzip the data
+    for run_id in tqdm(run_ids, desc="Downloading Neptune Data"):
+        run = neptune.init_run(project=project_name, with_id=run_id, mode="read-only")
+        for data_key in run.get_structure()[neptune_data_key].keys():
+            file_path = f"{store_directory}/{data_key}"
+            run[f"{neptune_data_key}/{data_key}"].download(destination=file_path)
+            # Try to unzip the file else continue to the next file
+            try:
+                with zipfile.ZipFile(file_path, "r") as zip_ref:
+                    # Create a directory with to store unzipped data
+                    os.makedirs(f"{file_path}_unzip", exist_ok=True)
+                    # Unzip the data
+                    zip_ref.extractall(f"{file_path}_unzip")
+                    # Remove the zip file
+                    os.remove(file_path)
+            except zipfile.BadZipFile:
+                # If the file is not zipped continue to the next file
+                # as it is already downloaded.
+                continue
+            except Exception as e:
+                print(f"An error occurred while unzipping or storing {file_path}: {e}")
+        run.stop()
+
+    print(f"{Fore.CYAN}{Style.BRIGHT}Data downloaded successfully!{Style.RESET_ALL}")
diff --git a/marl_eval/json_tools/pull_neptune_data.py b/marl_eval/json_tools/pull_neptune_data.py
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,3 +15,4 @@

		"""JSON tools for data preprocessing."""
		from .json_logger import JsonLogger
		from .json_utils import concatenate_json_files, pull_neptune_data