Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tools to execute a series of experiments on the same DO setup #33

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
cad238c
Testing with message skipping
lasarojc Sep 7, 2023
fea61ad
updating manifest format
lasarojc Sep 12, 2023
4b8760e
do not reset prometheus when resetting the network.
lasarojc Sep 20, 2023
9607401
do not reset prometheus when resetting the network
lasarojc Sep 20, 2023
d19a7f8
Addin the runtest.py script to perform multiple tests in sequence in DO
lasarojc Sep 28, 2023
ff978b1
retrieve the data at the end
lasarojc Sep 28, 2023
2b50ce4
Updating the runtests script and ansible task
lasarojc Oct 6, 2023
002a041
flood_skip templates and options
lasarojc Oct 10, 2023
291c7cb
Separates experiments configuration from the main makefile, which has…
lasarojc Nov 1, 2023
a2cac3b
Updating the limit_peers configs to reflect new paramenter name
lasarojc Nov 6, 2023
00ffe09
Renaming the prometheus restart option
lasarojc Nov 7, 2023
cabb8d1
clean up
lasarojc Nov 17, 2023
58ea17f
clean up
lasarojc Nov 17, 2023
c6c99e8
clean up
lasarojc Nov 17, 2023
158afe3
clean up
lasarojc Nov 17, 2023
07d9546
clean up
lasarojc Nov 17, 2023
fed0cf6
clean up
lasarojc Nov 17, 2023
a49207e
Merge branch 'main' into lasaro/msg_skip
lasarojc Nov 17, 2023
dfa30e9
Update Makefile
lasarojc Nov 17, 2023
5782068
Update README.md
lasarojc Nov 24, 2023
152ef21
Update script/runtests/runtests.py
lasarojc Nov 24, 2023
b4a791d
Fix the numbering
lasarojc Nov 24, 2023
54d29db
Merge branch 'main' into lasaro/msg_skip
lasarojc Nov 24, 2023
b36fb79
Fix the get-endpoints.sh script to only get validator or full nodes
lasarojc Nov 28, 2023
7df4e6b
reverting adding runtests.py
lasarojc Jan 8, 2024
a1f98b1
use a single endpoint
lasarojc Jan 8, 2024
d215efa
Use experiment.mk instead of Makefile to configure experiment. Use sc…
lasarojc Jan 8, 2024
381bc8e
Bringing runtests.py back
lasarojc Jan 8, 2024
6232b14
fix typo
lasarojc Jan 8, 2024
86b5970
Merge branch 'main' into lasarojc/runtests
lasarojc Jan 10, 2024
7674ac4
Merge branch 'main' into lasarojc/runtests
lasarojc Feb 9, 2024
748ca48
Merge branch 'main' into lasarojc/runtests
lasarojc Mar 6, 2024
e219d80
Merge branch 'main' into lasarojc/runtests
lasarojc Mar 6, 2024
cb9e589
Merges instructions
lasarojc Mar 14, 2024
f8d5d0d
Renaming template files
lasarojc Mar 14, 2024
81f6aae
Add a TODO to use argparse.
lasarojc Mar 18, 2024
f0d2176
Merge branch 'main' into lasarojc/runtests
lasarojc Mar 18, 2024
a072b77
update instructions
lasarojc Mar 18, 2024
1a2425c
Merge branch 'main' into lasarojc/runtests
lasarojc Mar 18, 2024
4c0242c
Optimized the setting of the load-runner
lasarojc Mar 19, 2024
8304883
Fixed the steps to run with runtests.py
lasarojc Mar 19, 2024
2bc8700
Merge branch 'main' into lasarojc/runtests
lasarojc Mar 28, 2024
0c8477b
test: Adds testing with variable intra loop sleep delay maximum value…
lasarojc Mar 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ endif
ansible-playbook ./ansible/prometheus-restart.yaml
ansible-playbook ./ansible/testapp-reinit.yaml

.PHONY: restart2
restart2:
ansible-playbook ./ansible/testapp-reinit.yaml

.PHONY: rotate
rotate:
./script/rotate.sh $(RUNNER_COMMIT_HASH) $(MANIFEST_PATH) \
Expand Down
84 changes: 73 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,49 @@ After you have all the prerequisites installed:

### Start the network

After you have set up the infrastructure:
After you have set up the infrastructure, you need to setup the experiment.
There are two ways of doing this, using the script `scripts/runtests.py` to automates part of it or not.

#### Using `scripts/runtests.py`

Execute the script once to update your experiment setup according to your templates.
Use the `-s` flag to run it just once, as in the following.

```bash
python3 runtests.py -l log.log -o flood_options.json -s
```

1. Create the VMs for the validators and Prometheus as specified in the manifest file.
Be sure to use your actual DO token and SSH key fingerprints for the `do_token` and `do_ssh_keys` variables.

```bash
make terraform-apply
```

After creating the DO droplets, this command will generate two files with information about the
IP addresses of the nodes: an Ansible inventory file `./ansible/hosts`, and
`./ansible/testnet/infrastructure-data.json` for E2E's `runner` tool.

2. Generate the testnet configuration, using the updated scripts

```bash
make configgen
```

3. Install all necessary software on the created VMs using Ansible

```bash
make ansible-install
```

4. Initialize the Prometheus instance

```bash
make prometheus-init
```


#### Without the script

5. Set up the test you will run in the `experiment.mk` file:
1. Set the path to your manifest file in the variable `MANIFEST`.
Expand All @@ -61,10 +103,10 @@ After you have set up the infrastructure:
the proportion of nodes that will run `VERSION_TAG` and `VERSION2_TAG` in the variables
`VERSION_WEIGHT` and `VERSION2_WEIGHT` respectively.
4. If necessary, set the variables `DO_INSTANCE_TAGNAME` and `DO_VPC_SUBNET` to customized
values to prevent collisions with other QA runs, including possible other users of the
DigitalOcean project who might be running these scripts. If the subnet is allocated in the
private IP address range 172.16.0.0/12, as it is in the unmodified file, a good choice should be
in the range 172.16.16.0/20 - 172.31.240.0/20. You may also need to rename the DO project
values to prevent collisions with other QA runs, including possible other users of the
DigitalOcean project who might be running these scripts. If the subnet is allocated in the
private IP address range 172.16.0.0/12, as it is in the unmodified file, a good choice should be
in the range 172.16.16.0/20 - 172.31.240.0/20. You may also need to rename the DO project
`cmt-testnet` in the `tf/project.tf` file to a unique name.

6. Create the VMs for the validators and Prometheus as specified in the manifest file.
Expand Down Expand Up @@ -107,13 +149,20 @@ After you have set up the infrastructure:
```

### Execute the load test
If you are using `script/runtests.py`, run it now.

```bash
python3 runtests.py -l log.log -o flood_options.json
```

If you are not using the script, first nitialize the load-runner node, if not it's yet running:

Initialize the load-runner node, if not it's yet running:
```bash
make loadrunners-init
```

The following command will start sending load until Ctrl-C is sent, so consider running this in its own terminal:

```bash
make runload
```
Expand All @@ -127,19 +176,20 @@ make runload
```

12. Retrieve the data produced during the execution.
You can either use the following command to retrieve both the prometheus and the blockstore databases together
If you have used `runtests.py`, the data may have been retrieved already.
Otherwise, you can either use the following command to retrieve both the prometheus and the blockstore databases together

```bash
make retrieve-data
```

To retrieve them independently use the following for prometheus, which will retrieve the data from all nodes.
or, to retrieve them independently, use the following for prometheus, which will retrieve the data from all nodes,

```bash
make retrieve-prometheus-data
```

For blockstore, use the following. Here, notice that the target node from which the data is retrieved can be changed via the environment variable `RETRIEVE_TARGET_HOST`.
and, for the blockstore, use the following. Here, notice that the target node from which the data is retrieved can be changed via the environment variable `RETRIEVE_TARGET_HOST`.
- `"any"` (default) - retrieve from one random validator from the inventory.
- `"all"` - retrieve from all nodes (very slow!);
- set it to the exact name of a validator to retrieve from that particular validator.
Expand All @@ -155,12 +205,24 @@ make runload
If you need to restart the running experiment, run the following command:

```sh
# Modify your testnet.toml file
# Update the configuration files locally
make configgen
# Update the configuration files and restart CometBFT in the nodes
make restart
# Reset and restart prometheus
make restart-prometheus
```

This command will delete all of the prometheus data, and re-initialize the nodes
on the network. The nodes will restart with the same configuration files and
IDs that they previously used, but all of their data will be deleted and reset.
on the network. The nodes will restart with the new configuration and all of their
data will be deleted and reset, but they will use the same IDs that they previously used.

If you do not want to update the configuration files and rerun experiments with same
configuration, you can omit the `make configgen` step.

If you are want to collect the metrics of multiple experiments on the same prometheus database
you can omit the `make restart-prometheus` command.

### Destroy the network

Expand Down
2 changes: 1 addition & 1 deletion ansible/scripts/get-endpoints.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
NUM_CONNECTIONS=1
NUM_CONNECTIONS=4 #TODO: pass this value from Makefile
endpoint_list=$(ansible-inventory --export --list | jq '[.[] | .hostvars][0]' | grep 'name.*validator' -B 1 | grep internal_ip | sed 's/\"//g' | cut -w -f3 | sed 's/,//' | sed 's/\(.*\)/ws:\/\/\1:26657\/v1\/websocket/')

IFS='
Expand Down
8 changes: 4 additions & 4 deletions experiment.mk
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Take care to make these values unique between experiments running
# on the same DigitalOcean project.
DO_INSTANCE_TAGNAME=main-testnet
DO_VPC_SUBNET=172.19.144.0/20
DO_INSTANCE_TAGNAME=main-testnet-lasaro
DO_VPC_SUBNET=172.31.240.0/20

MANIFEST ?= ./testnets/example.toml
MANIFEST ?= ./testnet.toml
MANIFEST_PATH=$(shell realpath $(MANIFEST))

VERSION_TAG ?= f92bace91 # tag of main on 05.02.2024
VERSION_TAG ?= 72450bc82902c8c3f5995da116454c067c0d3373
#VERSION_TAG ?= 3b783434f #v0.34.27 (cometbft/cometbft)
#VERSION_TAG ?= bef9a830e #v0.37.alpha3 (cometbft/cometbft)
#VERSION_TAG ?= v0.38.0-alpha.2
Expand Down
98 changes: 98 additions & 0 deletions script/runtests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
The `runtests.py` allows you to configure and execute a series of experiments in sequence, on the same DO setup, to make comparisons fair.
This is achieved by replacing tags on template files for the `../../testnet.toml` and `../../experiment.mk` by combinations of the values specified
in an `options` file.
For each combination, the `runtests.py` scripts invokes the make commands in Makefile` to recreate the node configuration,
clean up the nodes (not the prometheus server), push the new configuration, and run the experiments.,

## Configuration
To use `runtests.py` create an `options` file that specifies which template files should be used taken as input, which tags/fields should be replaced by what, and which files will be generated as output.

For example, consider the contents of the `sampl_tmpl.toml` file.

```json

config1 = {{conf 1 var 2}}
config2 = "{{conf 1 var 1}}"
config3 = {{conf 2 var 1}}
config4 = {{conf 2 var 2}}
```

It defines 4 tags, `{{conf 1 var 2}}`, `{{conf 1 var 1}}`, `{{conf 2 var 1}}` and `{{conf 2 var 2}}`.

Now consider the contents of file `example_options.json`, which defines how these tags will be replaced.
The `sequences` field specifies two independent experiments, which will be executed one after the other,
`seq1` and `seq2`.

`seq` specifies a series of `configurations`, `conf 1`, `conf 2` and `conf 3`.
Each configuration has a set of tags that will be associated with diferent values.
The resulting associations will be combined to into full configurations.
That is, all values of `conf 1` will be combined with all values of `conf 2`
and the result will be combined with all values of `conf 3`.

For example, `conf 1` specifies that tag "conf 1 var 1" will be first associated with value `c1v1 0` and then `c1v1 1`.
Both values will be used in combination with the tags associated by `conf 2` and `conf 3`.

The sets of values associate with a tag inside a `zip_vars` field, for the lack of a better name, are associated simultaneously
and in the same order. In the example, when tag `conf 1 var 1` is associated with `c1v1 0`, `conf 1 var 2` will be associated with `c1v2 0`.
and when tag `conf 1 var 1` is associated with `c1v1 1`, `conf 1 var 2` will be associated with `c1v2 1`.
Observe that the association happens in the context of the same file, `sampl_tmpl.toml`, but this need not be the case.

```json
{
"comment": "Entries are processed sequentially",
"sequences": [
{
"name": "seq 1",
"configurations": [
{
"name": "conf 1",
"zip_vars": [
{
"tmpl_file": "sampl_tmpl.toml",
"output_file": "sampl.out",
"tag": "conf 1 var 1",
"values": [
"c1v1 0",
"c1v1 1"
]
},
{
"tmpl_file": "sampl_tmpl.toml",
"output_file": "sampl.out",
"tag": "conf 1 var 2",
"values": [
"c1v2 0",
"c1v2 1"
]
}
]
},
{
"name": "conf 2",
...
},
{
"name": "conf 3",
...
}
]
},
{
"name": "seq 2",
...
}
]
}
```

## Templates

The `reactors` files have real examples of experiment configurations.

## Execution

```bash
cd script/runtests
python3 runtests.py -l log.log -o flood_skip_options.json -r -t log.log
```

117 changes: 117 additions & 0 deletions script/runtests/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
{
"comment": "Entries are processed sequentially",
"sequences": [
{
"name": "seq 1",
"configurations": [
{
"name": "conf 1",
"zip_vars": [
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 1 var 1",
"values": [
"c1v1 0",
"c1v1 1"
]
},
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 1 var 2",
"values": [
"c1v2 0",
"c1v2 1"
]
}
]
},
{
"name": "conf 2",
"zip_vars": [
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 2 var 1",
"values": [
"c2v1 0",
"c2v1 1"
]
},
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 2 var 2",
"values": [
"c2v2 0",
"c2v2 1",
"c2v2 2"
]
}
]
}
]
},
{
"name": "seq 2",
"configurations": [
{
"name": "conf 1",
"zip_vars": [
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 1 var 1",
"values": [
"c1v1 0",
"c1v1 1"
]
},
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 1 var 2",
"values": [
"c1v2 0",
"c1v2 1"
]
}
]
},
{
"name": "conf 2",
"zip_vars": [
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 2 var 1",
"values": [
"c2v1 0",
"c2v1 1"
]
},
{
"tmpl_file": "example_tmpl.toml",
"output_file": "example_out.toml",
"tag": "conf 2 var 2",
"values": [
"c2v2 0",
"c2v2 1"
]
},
{
"tmpl_file": "exampleb_tmpl.toml",
"output_file": "exampleb_out.toml",
"tag": "conf 2 var 3",
"values": [
"c2v3 0",
"c2v3 1"
]
}
]
}
]
}
]
}
Loading