If you are not familiar with Firehose, read the Overview page, which should give you enough information about the main pieces involved in spinning up a Firehose environment.
In a Solana setup, the Firehose reader will fetch its block data using RPC calls. While plenty of providers offer an Solana RPC endpoint, you can also run your own RPC node for this purpose, which is out of the scope of this document (see https://docs.solanalabs.com/operations)
Note that a firehose "poller" can connect to multiple RPC sources for redundancy, and a H-A setup can contain multiple pollers that deduplicate the received blocks.
Firehose will try, by default, to fetch the blocks from genesis (block 0 on Solana) and produce the blocks from there. Depending on your use case (and RPC data availability), you may want to choose an absolute lower block from which you want to start indexing (ignoring all blocks below).
Firehose sends blocks data in a stream AND stores it on disk. A single-machine deployment will work with local storage, but H-A setup will work best with an "object storage" service like GCS/S3 or equivalent.
Firehose creates temporary "one-block-files", then "merged-block-files". (Substreams service will also create a large amount of cache files).
The compressed merged blocks alone will consume 61GiB compressed PER DAY (22TiB per year)
- Download the latest firecore binary. Make sure you download the right binary for your computer (ARM or x86). Firehose-core is a Go project, so you can always build the binary from source.
- Add the firecore binary to the
$PATH
of your system. Verify that the command works:
$ firecore --version
firecore version v1.6.6 (Commit 26b7acc, Built 2024-11-20T18:39:14Z)
- Download the latest firesol binary. Make sure you download the right binary for your computer (ARM or x86). Firehose-solana is a Go project, so you can always build the binary from source.
- Add the firesol binary to the
$PATH
of your system. Verify that the command works:
$ firesol help
firesol fetching and tooling
Usage:
firesol [command]
(...)
We recommend running firehose in at least two different processes:
- Process 1 will run the following components:
reader-node
,merger
andrelayer
. - Process 2 will run the following components:
firehose
,substreams-tier1
andsubstreams-tier2
.
These two processes will share the same folder to persist the data, which is called firehose-data
by default. It is important that both processes have concurrent access to the data folder.
- Create a new folder, which you will be used to host all the Firehose data generated by the extraction. Make sure the folder has enough permissions to persist data in the filesystem.
$ mkdir /srv/firehose-data
- Create the configuration file for Process 1, called
reader-merger-relayer.yaml
:
start:
args: # 1.
- reader-node
- merger
- relayer
flags:
common-first-streamable-block: 300000000 # 2.
reader-node-path: firesol # 3.
reader-node-arguments:
fetch rpc 300000000 # 4
--state-dir "{node-data-dir}/poller/states"
--block-fetch-batch-size=4 # 5.
--endpoints=http://my.rpc.endpoint:8999 # 6.
--endpoints=<RPC endpoint 2>
--endpoints=<RPC endpoint 3>
- Specifies the components to spin up.
- Specifies the first block to stream.
- Specifies the solana-specific binary location (optionally with full path).
- Specifies the arguments of the
firesol
binary. In this case, the RPC Poller mode is used with300000000
as starting block number. - Specifies how many blocks must be fetched from the RPC at a time. For example, if it is set to
20
, a batch of 20 blocks will be requested to the RPC. - Specifies the RPC endpoint. You can specify several endpoints.
- Create the configuration file for Process 2, called
firehose-substreams.yaml
:
start:
args: # 1.
- firehose
- substreams-tier1
- substreams-tier2
flags:
common-first-streamable-block: 300000000 # 2.
common-live-blocks-addr: localhost:10014 # 3.
substreams-tier1-block-type: sf.cosmos.type.v2.Block # 4.
substreams-tier1-grpc-listen-addr: :9000 # 5.
- Specifies the components to spin up.
- Specifies the first block to stream.
- Specifies the address where to reach the
relayer
component running in its own isolated Firehose process. substreams-tier1
component flag: specifies the data model of the extracted data.substreams-tier1
component flag: specifies where the Substreams server listens for connections.- Lastly, start two Firecore processes in two different command-line terminals.
firecore start -c reader-merger-relayer.yaml
firecore start -c firehose-substreams.yaml