Skip to content

bwarelabs/solana-syncer

Repository files navigation

Solana Syncer

Project Overview

The Solana Syncer is a specialized tool designed to facilitate the migration and synchronization of data from Google BigTable to Tencent Cloud Storage in a sequence file format. It operates in two main modes: ArchiveSync for handling large-scale, historical data migrations, and LiveSync for real-time data synchronization. The Syncer is a critical component for ensuring that Solana's data can be efficiently stored and accessed on alternative cloud storage platforms, providing greater flexibility and independence from Google Cloud infrastructure.

ArchiveSync

  • Purpose: Handles the bulk migration of existing historical data from Google Bigtable to Tencent Cloud Object Storage.
  • Process:
    • Data Source: The ArchiveSync process reads existing data stored in Google Bigtable.
    • Conversion: This data is converted into sequence files, a Hadoop-compatible format, enabling efficient storage and retrieval.
    • Storage: The converted sequence files are then uploaded to Tencent Cloud Object Storage.
    • Execution: We used a Slurm cluster in order to handle the massive scale of data efficiently, but the Syncer can also be run as a single instance for smaller datasets using the provided scripts.

LiveSync

  • Purpose: Continuously syncs real-time data generated by Solana nodes to Tencent Cloud Object Storage.
  • Process:
    • Data Source: Live data is exposed from Solana nodes using a tool based on Solana Geyser plugin (see Solana Cos Plugin), which stores data locally on the node’s hard disk.
    • Conversion: The Syncer reads the local files, converts them into sequence files, and prepares them for cloud storage.
    • Storage: The converted data is then uploaded to Tencent Cloud Object Storage in real-time.

HbaseSync

  • Purpose: Load historical data from Tencent COS into Hbase.
  • Process:
    • Data Source: The Hbase sync process reads sequence files from Tencent COS bucket and uploads it to Hbase.
    • Conversion: In COS we store blocks and entries as sequence files, the syncer reads the files and produces data for the 'blocks', 'tx', tx-by-addr' and 'entries' tables. Note that 'tx' and 'tx-by-addr' are generated by the syncer from the block sequence file.
    • Storage: Data is stored in Hbase.

System Components and Interaction (also see Solana Test Setup)

  1. Bigtable Emulator:

    • A local emulator for Google Bigtable, allowing developers to simulate Bigtable's environment without incurring costs. It is used during development to test interactions with Bigtable locally.
  2. HBase Container:

    • A local setup of HBase, part of the Hadoop ecosystem, where data is imported from Tencent Cloud Object Storage for querying and analysis.
  3. Solana Bigtable HBase Adapter:

    • Acts as an intermediary interface, mimicking Google Bigtable’s API. This allows the Solana node to interact with HBase as if it were Google Bigtable, enabling seamless data operations without modifying the node's logic.
  4. Solana Lite RPC:

    • A lightweight RPC server used to query data from HBase, providing a simplified interface for accessing the blockchain data stored in HBase.
  5. Validator:

    • A Solana test validator running locally. This component is crucial for simulating a Solana environment during testing, ensuring that the entire setup behaves as expected.
  6. Docker and Docker Compose:

    • The entire system is containerized using Docker, with Docker Compose orchestrating the deployment and interaction between these services. Depending on the use case (e.g., using the Bigtable emulator or the HBase adapter), configuration changes such as uncommenting specific lines in the Docker Compose file and adjusting environment variables are necessary.

Configuration

The Syncer relies on a configuration file config.properties to manage settings such as BigTable connection details, local file directory to watch, COS credentials, and processing parameters. See the config.properties file for more details.

How to run the Syncer

You can run the Syncer using existing scripts, either single instance or in a cluster using something like Slurm. For local development, you can check Solana Test Setup.

  1. Set the proper configuration in config.properties, see the existing config.properties for reference.
  2. Add the bigtable-service-account-key.json file to the root directory. You should get this file from the Google Cloud Console.
  3. Run either run_syncer_bigtable.sh or run_syncer_local_files.sh to start the Syncer service, depending on the data source you want to use.
    IMPORTANT: When running the Syncer in local mode, you need to have the both Solana node and Solana Cos Plugin running and producing data to the local directory specified in the config.properties file. By default, the local directory in which the Syncer will look for files and where the Solana Cos Plugin will write the data is /data.

Generating protobuf implementations (TODO: add this after ingestor is ready)

protoc --java_out=src/main/java proto/*.proto