Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update readme with json logger details #47

Merged
merged 14 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ For a more detailed example illustrating how multiple plots may be made for vari

In order to use the tools, raw experiment data must be in the suggested format and stored in a json file. If given in the correct format, `marl-eval` will aggregate experiment data, plot the results and produce aggregated tabular results as a `.csv` file, in LaTeX table formatting and in the terminal.

<a id="exp_structure"></a>
### Data Structure for Raw Experiment data 📒

In order to use the tools we suggest effectively, raw data json files are required to have the following structure :
Expand Down Expand Up @@ -150,13 +151,18 @@ Here `run_1` to `run_n` correspond to the number of independent runs in a given
>
> For producing probability of improvement plots, it is important that any algorithm names in the dataset do not contain any commas.

### Data Tooling
[**Pull Neptune Data**](marl_eval/json_tools/pull_neptune_data.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune.
### JSON Data Tooling

[**JSON Files Merging Script**](marl_eval/json_tools/merge_json_files.py): `concatenate_files` reads multiple json files from a specified local directory and concatenates their contents into a single structured dictionary, while ensuring uniqueness of seed numbers within the data. It handles nested json structures and saves the concatenated result into a new single json file for downstream aggregation and plotting.
[**JSON Logger**](marl_eval/json_tools/json_logger.py): `JsonLogger` handles logging data according to the structured format detailed [above](#exp_structure). This makes it easy to follow our evaluation protocol as files generated by the `JsonLogger` can be directly passed to the plotting tools.

[**Neptune Data Pulling Script**](marl_eval/json_tools/json_utils.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune.

[**JSON File Merging Script**](marl_eval/json_tools/json_utils.py): `concatenate_json_files` reads multiple JSON files from a specified local directory and concatenates their contents into a single structured JSON file.

> 📌 Using `pull_neptune_data` followed by `concatenate_files` forms an effective workflow, where multiple JSON files from different experiment runs are first pulled from Neptune and then merged into a single file, ready for use in marl-eval.

For more details on how to use the JSON tools, please see the [detailed usage guide](docs/json_tooling_usage.md).

### Metrics to be normalised during data processing ⚗️
Certain metrics, like episode returns, are required to be normalised during data processing. In order to achieve this it is required that users give these metric names, in the form of strings in a python list, to the `data_process_pipeline` function, the `create_matrices_for_rliable` function and all plotting functions as an argument. In the case where no normalisation is required this argument may be omitted.

Expand Down
102 changes: 102 additions & 0 deletions docs/json_tooling_usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# JSON tooling usage guide

## JSON logger

The JSON logger will write experiment data to JSON files in the format required for downstream aggregation and plotting with the MARL-eval tools. To initialise the logger the following arguments are required:

* `path`: the path where a file called `metrics.json` will be stored which will contain all logged metrics for a given experiment. Data will be stored in `<path>/metrics.json` by default. If a JSON file already exists at a particular path, new experiment data will be appended to it. MARL-eval currently does not support asynchronous logging. So if you intend to run distributed experiments, please create a unique `path` per experiment and concatenate all generated JSON files after all experiments have been run with the provided `concatenate_json_files` function.
* `algorithm_name`: the name of the algorithm being run in the current experiment.
* `task_name`: the name of the task in the current experiment.
* `environment_name`: the name of the environment in the current experiment.
* `seed`: the integer value of the seed used for pseudo-randomness in the current experiment.

An example of initialising the JSON logger could look something like:

```python
from marl_eval.json_tools import JsonLogger

json_logger = JsonLogger(
path="experiment_results",
algorithm_name="IPPO",
task_name="2s3z",
environment_name="SMAX",
seed=42,
)
```

To write data to the logger, the `write` method takes in the following arguments:

* `timestep`: the current environment timestep at the time of evaluation.
* `key`: the name of the metric to be logged.
* `value`: the scalar value to be logged for the current metric.
* `evaluation_step`: the number of evaluations that have been performed so far.
* `is_absolute_metric`: a boolean flag indicating whether an absolute metric is being logged.

Suppose the `4`th evaluation is being performed at environment timestep `40000` for the `episode_return` metric with a value of `12.9` then the `write` method could be used as follows:

```python
json_logger.write(
timestep=40_000,
key="episode_return",
value=12.9,
evaluation_step=4,
is_absolute_metric=False,
)
```

In the case where the absolute metric for the `win_rate` metric with a value of `85.3` is logged at the `200`th evaluation after `2_000_000` timesteps, the `write` method would be called as follows:

```python
json_logger.write(
timestep=2_000_000,
key="win_rate",
value=85.3,
evaluation_step=200,
is_absolute_metric=True,
)
```

## Neptune data pulling script
The `pull_neptune_data` script will download JSON data for multiple experiment runs from Neptune given a list of one or more Neptune experiment tags. The function accepts the following arguments:

* `project_name`: the name of the neptune project where data has been logged given as `<workspace_name>/<project_name>`.
* `tag`: a list of Neptune experiment tags for which JSON data should be downloaded.
* `store_directory`: a local directory where downloaded JSON files should be stored.
* `neptune_data_key`: a key in a particular Neptune run where JSON data has been stored. By default this will be `metrics` implying that the JSON file will be stored as `metrics/<metric_file_name>.zip` in a given Neptune run. For an example of how data is uploaded please see [here](https://github.com/instadeepai/Mava/blob/ce9a161a0b293549b2a34cd9a8d794ba7e0c9949/mava/utils/logger.py#L182).

In order to download data, the tool can be used as follows:

```python
from marl_eval.json_tools import pull_netpune_data

pull_netpune_data(
project_name="DemoWorkspace/demo_project",
tag=["experiment_1"],
store_directory="./neptune_json_data",
)
```

## JSON file merging script
The `concatenate_json_files` function will merge all JSON files found in a given directory into a single JSON file ready to be used for downstream aggregation and plotting with MARL-eval. The function accepts the following arguments:

* `input_directory`: the path to the directory containing multiple JSON files. This directory can contain JSON files in arbitrarily nested directories.
* `output_json_path`: the path where the merged JSON file should be stored.

The function can be used as follows:

```python
from marl_eval.json_tools import concatenate_json_files

concatenate_json_files(
input_directory="path/to/some/folder/",
output_json_path="path/to/merged_file/folder/",
)
```
sash-a marked this conversation as resolved.
Show resolved Hide resolved

## An example use case:
* Run 10 independent trials of an experiment on different cloud machines with different seeds.
* Log each experiment using the `JsonLogger` to it's own path e.g `metrics/experiment_<i>`.
* Push these JSON logs to neptune.
* Retrieve all the JSON logs locally using the `pull_neptune_data` function.
* Merge all the JSON logs using the `concatenate_json_files` function.
* Use the plotting tools to visualize the full experiment results.
1 change: 1 addition & 0 deletions marl_eval/json_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@

"""JSON tools for data preprocessing."""
from .json_logger import JsonLogger
from .json_utils import concatenate_json_files, pull_neptune_data
2 changes: 1 addition & 1 deletion marl_eval/json_tools/json_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def write(
Args:
timestep (int): the current environment timestep.
key (str): the name of the metric to be logged.
value (str): the value of the metric to be logged.
value (float): the value of the metric to be logged.
evaluation_step (int): the number of evaluations already run.
is_absolute_metric (bool): whether the metric being logged is
an absolute metric.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,13 @@

import json
import os
import zipfile
from collections import defaultdict
from typing import Dict, Tuple
from typing import Dict, List, Tuple

import neptune
from colorama import Fore, Style
from tqdm import tqdm


def _read_json_files(directory: str) -> list:
Expand Down Expand Up @@ -62,7 +65,7 @@ def _check_seed(concatenated_data: Dict, algo_data: Dict, seed_number: str) -> s
return seed_number


def concatenate_files(
def concatenate_json_files(
input_directory: str, output_json_path: str = "concatenated_json_files/"
) -> Dict:
"""Concatenate all json files in a directory and save the result in a json file."""
Expand Down Expand Up @@ -104,3 +107,56 @@ def concatenate_files(
+ f"{output_json_path}metrics.json successfully!{Style.RESET_ALL}"
)
return concatenated_data


def pull_neptune_data(
project_name: str,
tag: List,
store_directory: str = "./downloaded_json_data",
neptune_data_key: str = "metrics",
) -> None:
"""Pulls experiment json data from Neptune to a local directory.

Args:
project_name (str): Name of the Neptune project.
tag (List): List of tags for the experiment(s) that contain the
desired JSON files.
store_directory (str, optional): Directory to store the data.
RuanJohn marked this conversation as resolved.
Show resolved Hide resolved
Default: ./downloaded_json_data.
neptune_data_key (str, optional): Key in the neptune run where the
json data is stored. Default: metrics.
"""
# Get the run ids
project = neptune.init_project(project=project_name)
runs_table_df = project.fetch_runs_table(state="inactive", tag=tag).to_pandas()
run_ids = runs_table_df["sys/id"].values.tolist()

# Check if store_directory exists
if not os.path.exists(store_directory):
os.makedirs(store_directory)

# Download and unzip the data
for run_id in tqdm(run_ids, desc="Downloading Neptune Data"):
run = neptune.init_run(project=project_name, with_id=run_id, mode="read-only")
for data_key in run.get_structure()[neptune_data_key].keys():
file_path = f"{store_directory}/{data_key}"
run[f"{neptune_data_key}/{data_key}"].download(destination=file_path)
# Try to unzip the file else continue to the next file
try:
with zipfile.ZipFile(file_path, "r") as zip_ref:
# Create a directory with to store unzipped data
os.makedirs(f"{file_path}_unzip", exist_ok=True)
# Unzip the data
zip_ref.extractall(f"{file_path}_unzip")
# Remove the zip file
os.remove(file_path)
except zipfile.BadZipFile:
# If the file is not zipped continue to the next file
# as it is already downloaded and doesn't need to be
# unzipped.
continue
except Exception as e:
print(f"An error occurred while unzipping or storing {file_path}: {e}")
run.stop()

print(f"{Fore.CYAN}{Style.BRIGHT}Data downloaded successfully!{Style.RESET_ALL}")
71 changes: 0 additions & 71 deletions marl_eval/json_tools/pull_neptune_data.py

This file was deleted.

Loading