Skip to content
/ blazar Public
forked from ChorusOne/blazar

Automatic Cosmos SDK Network Upgrades

License

Notifications You must be signed in to change notification settings

d0tsh/blazar

 
 

Repository files navigation

Logo

Blazar: Automatic Cosmos SDK Network Upgrades

Life is too short to wait for the upgrade block height!

Getting Started · CLI · Web UI · Proxy UI · Slack · FAQ

What is Blazar?

Blazar is a standalone application designed to automate network upgrades for Cosmos SDK based blockchain networks.

Web UI

The Need for Blazar

At Chorus One, we manage over 60 blockchain networks, many of which are part of the Cosmos Ecosystem. Each network has its own upgrade schedule, which can vary from monthly to bi-weekly, depending on the urgency of the upgrade and Cosmos SDK releases. Our 24/7 on-call team handles multiple upgrades weekly.

The upgrade process is generally straightforward but can be time-consuming. Here's how it typically works:

  1. An upgrade is announced via a governance proposal or other communication channels (Discord, Telegram, Slack, Email, etc.).
  2. The upgrade details specify the block height and the version network operators should use.
  3. At the specified block height, the node halts, and operators must upgrade the binary and restart the node(s).
  4. While waiting for consensus, operators often engage in progress updates on Discord.
  5. Once the upgrade is successful, operators return to their regular tasks.

Blazar was created to automate this process, allowing our team to use their time more productively. It currently handles the majority of network upgrades for Cosmos Networks at Chorus One.

Key Features

  • Upgrade Feeds: Fetch upgrade information from multiple sources like "Governance", "Database", and "Local".
  • Upgrade Strategies: Supports various upgrade scenarios, including height-specific and manually coordinated upgrades.
  • Pre and Post Upgrade Checks: Automate checks like docker image existence, node and consensus status.
  • Stateful Execution: Tracks upgrade stages to ensure consistent execution flow.
  • Cosmos SDK Gov/Upgrade Module Compliance: Understands and respects the Cosmos SDK governance module upgrades.
  • Slack Integration: Optional Slack notifications for every action Blazar performs.
  • Modern Stack: Includes CLI, UI, REST, gRPC, Prometheus metrics, and Protobuf.
  • Built by Ops Team: Developed by individuals with firsthand experience in node operations.

Comparison to Cosmovisor

While many operators use Cosmovisor with systemd services, this setup doesn't meet our specific needs. Instead of relying on GitHub releases, we build our own binaries, ensuring a consistent build environment with Docker. This approach allows us to use exact software versions and generate precise build artifacts (e.g., libwasmvm.so).

Cosmovisor is designed to run as the parent process of a validator node, replacing node binaries at the upgrade height. However, this model isn't compatible with Docker Compose managed services. To address this, we developed Blazar as a more effective solution tailored to our setup.

Note: If you'd like Blazar to work with systemd services, contributions are welcome!

Blazar Cosmovisor
Control plane Docker Fork/Exec
Upgrade mechanism Image Tag Update Replace Binary
Configuration TOML (Blazar) + YAML (docker-compose.yml) Custom directory structure
Upgrade strategy Gov, Coordinated, Uncoordinated Gov, Coordinated, Uncoordinated
Upgrade scope Single, Multi-node* Single node
Pre checks ✔️ ✔️ (preupgrade.sh)
Post checks ✔️
Metrics ✔️
Notifications ✔️ (Slack)
UI + REST + RPC ✔️
CLI ✔️ ✔️
Upgrade Feeds Governance, Database, Local Governance, Local**

* DATABASE registered upgrades are executed by multiple nodes feeding from the provider

** For Cosmovisor everything looks as if it was scheduled through governance

How Blazar Works

Blazar Under the Hood

Blazar constructs a prioritized list of upgrades from multiple providers and takes appropriate actions based on the most recent state. It retrieves block heights from WSS endpoints or periodic gRPC polls and triggers Docker components when the upgrade height is reached. Notifications are sent to logs and Slack (if configured).

In simple terms, Blazar performs the following steps:

  1. Upgrade List Construction: Blazar compiles a unified list of upgrades from various providers (database, local, chain), resolving priorities based on the highest precedence.
  2. State Evaluation & Action: The Blazar daemon reads this list in conjunction with the most recent state, taking relevant actions, such as performing a pre-upgrade check or finalizing the upgrade process.
  3. Block Height Detection: The daemon tracks block heights via WSS endpoints or periodic gRPC polls.
  4. Upgrade Execution: When the upgrade height is reached, the corresponding Docker components are executed.
  5. Notification Delivery: Blazar sends notifications to logs and Slack (if configured).

While the logic is simple, it's important to understand the differences between the types of upgrades:

  1. Governance: A coordinated upgrade initiated by chain governance, expected to be executed by all validators at a specified block height.
  2. Non Governance Coordinated: An upgrade initiated by operators, not by the chain, but it is expected to occur at the same block height across all validators.
  3. Non Governance Uncoordinated: An operator-initiated upgrade, independent of chain governance, that can be executed at any time.

NOTE: Blazar does one job and does it well, meaning you need one Blazar instance per Cosmos-SDK node.

NOTE: You are free to choose your upgrade proposal providers. An SQL database is not mandatory - you can opt to use the "LOCAL" provider or both simultaneously, depending on your needs.

Getting Started

To use Blazar, first build the binary with the Go compiler, then deploy it on a host with Docker Compose installed.

$ apt-get install golang
$ apt-get install docker-compose

Configure and run Blazar:

$ cp blazar.sample.toml blazar.toml
$ make build
$ ./blazar run --config blazar.toml

Requirements: Docker & Docker Compose

Blazar is designed to work with nodes configured and spawned via Docker Compose.

CLI & REST Interface

Register or list upgrades using the CLI:

$ ./blazar upgrades list --host 127.0.0.1 --port 5678
... table with upgrades ...

$ ./blazar upgrades register --height "13261400" --tag '4.2.0' --type NON_GOVERNANCE_COORDINATED --source DATABASE --host 127.0.0.1 --port 5678 --name 'security upgrade'

Or use the REST interface:

curl -s http://127.0.0.1:1234/v1/upgrades/list

Slack Integration

Track the upgrade process in a single Slack thread 🧵.

Slack Notifications

Proxy UI

Blazar Proxy consolidates the latest updates from all Blazar instances. Here's how you can run it:

$ cp proxy.sample.toml proxy.toml
$ ./blazar proxy --config proxy.toml

Proxy UI

Frequently Asked Questions

Why do I need to register a version tag separately?

Cosmos-SDK Software Upgrade Proposals don't explicitly specify the version you must upgrade to. It can be derived from the rich text data within the proposal, such as:

  1. A link to the binary release (if present).
  2. The proposal title.
  3. The human-written text.

Currently, Blazar does not infer which version should be used. As a network operator, you must provide a version tag; otherwise, Blazar will skip the upgrade.

What are the upgrade priorities, and why do I need them?

Consider a scenario where a network operator runs three nodes. The first node uses an image with a patch (e.g., PebbleDB support), while the other two run vanilla upstream images.

In this configuration, Blazar uses three upgrade sources:

  • CHAIN (priority 1)
  • DATABASE (priority 2)
  • LOCAL (priority 3)

All three Blazar instances detect a new upgrade from CHAIN. The operator registers a new version in the DATABASE so that every instance knows what to pick up. However, one node requires a patched version. The network operator must register a new version in the LOCAL provider.

Now, the first node sees two different versions from two providers (DATABASE & LOCAL). Which one should it use? The one with the higher priority

The end state on each Blazar node is:

  1. Node 1 - v1.0.0-patched, priority 3
  2. Nodes 2 & 3 - v1.0.0, priority 2

The same logic applies to upgrade entries and versions.

What happens if I don't register a version tag for an upgrade?

Blazar will skip the upgrade.

Blazar doesn't display any upgrades?

Blazar maintains its own state of all upgrades, which is periodically refreshed at the interval specified in your configuration. If you don't see the upgrades, it is likely that you need to wait for the given interval for Blazar to update the state.

NOTE: Adding a new version or upgrade via CLI/UI will trigger a state update.

The upgrade governance proposal passed, but the upgrade is still in the 'SCHEDULED' state?

Blazar will change the upgrade state from 'SCHEDULED' to 'ACTIVE' when the voting period is over.

What is the purpose of the 'force cancel' flag?

There are two ways to cancel an upgrade in Blazar. The standard method creates a cancellation entry in the provider storage, such as an SQL database, if no upgrade is registered. Otherwise, it updates the upgrade status field to CANCELLED for the upgrade with the highest priority.

Blazar periodically fetches and updates the list of upgrades at the interval specified in your configuration. But what if you need to cancel the upgrade immediately and can't wait for the next fetch? For such uncommon scenarios, you can use the force cancel mode, which sets the CANCELLED status directly in the Blazar state machine.

The force mode works per Blazar instance, so if you have, say, 3 nodes, you would need to force cancel all three via CLI/UI/RPC calls. If you use the DATABASE provider, you can simply cancel the upgrade for everyone, but you need to wait for Blazar to pick it up.

To simplify, think of the force cancel as the last line of defense. It is unlikely that you will need it, but it's there just in case.

I registered a new upgrade, but only one node is 'up to date'?

Remember that Blazar refreshes its internal state periodically. If you registered a new upgrade on one instance with the 'DATABASE' provider and the other node doesn't see it, you have two options:

  1. Wait for Blazar to sync (see 'Time to next sync' in the UI).
  2. Force sync via UI/CLI/RPC call.
Does Blazar work with chains with non-standard gov module (e.g., Neutron)?

Yes, but you'll need to register manually a GOVERNANCE type upgrade in LOCAL or DATABASE provider.

Neutron is a smart contract chain that implements its own governance (DAO DAO) via an on-chain contract. Blazar currently doesn't understand the custom smart contract logic, therefore the operator cannot use the CHAIN provider. However, the Neutron governance is integrated with Cosmos SDK upgrades module and will output the upgrade-info.json at the upgrade height. Therefore from Blazar perspective, the GOVERNANCE type is valid, but the source provide must be different.

What is the difference between 'compose-file' and 'env-file' upgrade mode?

When performing a node upgrade, Blazar updates the docker version tag (e.g., v1.0.0 to v2.0.0). That version is stored in the docker-compose.yaml file in the following form:

$ cat docker-compose.yaml | grep 'image'
image: <client_id>.dkr.ecr.us-east-1.amazonaws.com/chorusone/archway:v1.0.0

or in the .env file:

$ cat docker-compose.yaml | grep 'image'
image: <client_id>.dkr.ecr.us-east-1.amazonaws.com/chorusone/archway:${VERSION_archway}

$ cat .env
VERSION_archway=1.0.0

Why do we support both upgrade modes and which one is better?

The compose-file is simpler, but we highly recommend the env-file mode. If the version tag is stored in the .env file, the blast radius of possible mistakes is very low, unlike editing the whole docker-compose.yaml to replace one single variable.

What is the purpose of SQL migration files?

This question is relevant for anyone who wants to use the DATABASE provider.

Blazar leverages GORM to manage SQL databases. If you enable automatic migrations by setting:

[upgrade-registry.provider.database]
auto-migrate = true

you can disregard the migrations files since GORM will automatically initialize all necessary SQL tables.

However, if auto-migrate is disabled, you'll need to manually apply the migration SQL statements.

License

Blazar is licensed under the Apache 2.0 License. For more detailed information, please refer to the LICENSE file in the repository.

About

Automatic Cosmos SDK Network Upgrades

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 92.9%
  • HTML 6.6%
  • Makefile 0.5%