Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run node services, unable to sync #12354

Open
YuXiaoCoder opened this issue Oct 31, 2024 · 13 comments
Open

Unable to run node services, unable to sync #12354

YuXiaoCoder opened this issue Oct 31, 2024 · 13 comments
Assignees
Labels
community Issues created by community investigation required Node Node team

Comments

@YuXiaoCoder
Copy link

Contact Details

[email protected]

Node type

RPC

Which network are you running?

mainnet

What happened?

Unable to run node services, unable to sync

Version

aurora-refiner 0.28.2-2.2.1

Relevant log output

2024-10-31T01:43:24.830032Z  INFO indexer: Load config from /mnt/auroramain/node/near...
2024-10-31T01:43:24.843590Z  WARN neard: /mnt/auroramain/node/near/config.json: encountered unrecognised field: state_sync.?.sync.?.external_storage_fallback_threshold
2024-10-31T01:43:24.843607Z  INFO config: Validating Config, extracted from config.json...
2024-10-31T01:43:24.849762Z  WARN genesis: Skipped genesis validation
2024-10-31T01:43:24.849812Z  INFO config: Validating Genesis config and records. This could take a few minutes...
2024-10-31T01:43:24.850006Z  INFO config: All validations have passed!
2024-10-31T01:43:24.858760Z  INFO db_opener: Opening NodeStorage path="/mnt/auroramain/node/near/data" cold_path="none"
2024-10-31T01:43:24.859324Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:43:24.915339Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-10-31T01:43:24.915360Z  INFO db_opener: The database exists. path=/mnt/auroramain/node/near/data
2024-10-31T01:43:24.915369Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:43:25.376571Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-10-31T01:43:25.376614Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:43:25.407669Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-10-31T01:43:25.407689Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:43:25.649448Z  INFO runtime: Error when getting the gc stop height. This error may naturally occur after the gc_num_epochs_to_keep config is increased. It should disappear as soon as the node builds up all epochs it wants. Error: DB Not Found Error: epoch block: HRKbCC2Dt4tX7fHApoNnDkAChFYE5kqzWULkejRZZCyg
2024-10-31T01:43:25.664069Z  INFO memtrie: Loading tries to memory for shards []...
2024-10-31T01:43:25.665699Z  INFO memtrie: Memtries loading complete for shards []
2024-10-31T01:43:25.665711Z  INFO chain: Init: header head @ #131590669 7TEa7MedJ7o5vEirfPAzcU5bzCVhpnhwea2gob4fgtyq; block head @ #131432894 HRKbCC2Dt4tX7fHApoNnDkAChFYE5kqzWULkejRZZCyg
thread 'main' panicked at /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/971e76d/chain/client/src/client_actor.rs:168:6:
called `Result::unwrap()` on an `Err` value: Chain(DBNotFoundErr("epoch block: HRKbCC2Dt4tX7fHApoNnDkAChFYE5kqzWULkejRZZCyg"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'thread 'actix-rt|system:0|arbiter:1actix-rt|system:0|arbiter:0' panicked at ' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/time/entry.rs/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/time/entry.rs::568568::99:
:
A Tokio 1.x context was found, but it is being shutdown.A Tokio 1.x context was found, but it is being shutdown.

Node head info

2024-10-31T01:51:56.743438Z  INFO neard: version="2.2.1" build="unknown" latest_protocol=71
2024-10-31T01:51:56.743637Z  WARN neard: /mnt/auroramain/node/near/config.json: encountered unrecognised field: state_sync.?.sync.?.external_storage_fallback_threshold
2024-10-31T01:51:56.743644Z  INFO config: Validating Config, extracted from config.json...
2024-10-31T01:51:56.745643Z  WARN genesis: Skipped genesis validation
2024-10-31T01:51:56.745655Z  WARN genesis: Skipped genesis validation
2024-10-31T01:51:56.745660Z  INFO config: All validations have passed!
2024-10-31T01:51:56.745723Z  INFO db_opener: Opening NodeStorage path="/mnt/auroramain/node/near/data" cold_path="none"
2024-10-31T01:51:56.745736Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:51:56.770519Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-10-31T01:51:56.770537Z  INFO db_opener: The database exists. path=/mnt/auroramain/node/near/data
2024-10-31T01:51:56.770550Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:51:56.861987Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-10-31T01:51:56.862020Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-10-31T01:51:56.883726Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-10-31T01:51:56.883746Z  INFO db: Opened a new RocksDB instance. num_instances=1
"CHUNK_TAIL": 131223643
"FINAL_HEAD": Tip { height: 131432892, last_block_hash: 9sRwBdtfPxUbk46CvYpLPw6ZvqjJLS8Uxnpjsb96HPsR, prev_block_hash: 5Ei6y7b6nec52s1V2rR8ruGUDrQXYLQt2bo3vfLEZieK, epoch_id: EpochId(B69u1PXcKpMWz9ZF5R88Bzts36CiswAGMYM1oG7rbsMk), next_epoch_id: EpochId(7eNGxuudA6PL5WB5jc9vkZWcdmJEftG1Vd2ZLMaajGsn) }
"FORK_TAIL": 131336099
"GENESIS_JSON_HASH": 93on1kcuqTXU94zGyGvBm3YYpPqCkaM8bssbxndgbeRX
"GENESIS_STATE_ROOTS": [8EhZRfDTYujfZoUZtZ3eSMB9gJyFo5zjscR12dEcaxGU]
"HEAD": Tip { height: 131432894, last_block_hash: HRKbCC2Dt4tX7fHApoNnDkAChFYE5kqzWULkejRZZCyg, prev_block_hash: EL8gSixoSmbbdoTGGEGdj8QGhDH4C6KM7EbHoxrRaLzB, epoch_id: EpochId(B69u1PXcKpMWz9ZF5R88Bzts36CiswAGMYM1oG7rbsMk), next_epoch_id: EpochId(7eNGxuudA6PL5WB5jc9vkZWcdmJEftG1Vd2ZLMaajGsn) }
"HEADER_HEAD": Tip { height: 131590669, last_block_hash: 7TEa7MedJ7o5vEirfPAzcU5bzCVhpnhwea2gob4fgtyq, prev_block_hash: EFjaFnUG6xu4U5qQet1herCm9H8wTa558e7Wxoi22hpZ, epoch_id: EpochId(9WSXJ8M5ZPwCZr3sNWeNvh2Bwj3vLDZcVTV15Efpc2bZ), next_epoch_id: EpochId(AvS12NccqdzcZESVaCm7Cr5HpzCjZnbe6Ghy2CZPYsRK) }
"LARGEST_TARGET_HEIGHT": 131433122
"LATEST_KNOWN": LatestKnown { height: 131590669, seen: 1730335140042944331 }
"STATE_SYNC_DUMP:\0\0\0\0\0\0\0\0": AllDumped { epoch_id: EpochId(4c3AEoBnXPoqPM8cQHxqRfXbq5hm6CpJAv9okUSGMMxZ), epoch_height: 2362 }
"STATE_SYNC_DUMP:\u{1}\0\0\0\0\0\0\0": AllDumped { epoch_id: EpochId(4c3AEoBnXPoqPM8cQHxqRfXbq5hm6CpJAv9okUSGMMxZ), epoch_height: 2362 }
"STATE_SYNC_DUMP:\u{2}\0\0\0\0\0\0\0": AllDumped { epoch_id: EpochId(4c3AEoBnXPoqPM8cQHxqRfXbq5hm6CpJAv9okUSGMMxZ), epoch_height: 2362 }
"STATE_SYNC_DUMP:\u{3}\0\0\0\0\0\0\0": AllDumped { epoch_id: EpochId(4c3AEoBnXPoqPM8cQHxqRfXbq5hm6CpJAv9okUSGMMxZ), epoch_height: 2362 }
"SYNC_HEAD": Tip { height: 13740748, last_block_hash: 69A1wh25GwoD2CzEuQhs8D2goWPXqe1Liu3jq1i5tdMS, prev_block_hash: 8ZdbgiXn3JpfGGdMGByMqVE5GppKYrojjHJjgZajNV8Z, epoch_id: EpochId(EeWh36LxiVaZgQRsyCzyAUBhaL5yACKZSjK2vTobAC4d), next_epoch_id: EpochId(7edSVzdsSoo1ujdy79abYv3ztbfx7WDawdhCgKhK5qjj) }
"TAIL": 131336099
2024-10-31T01:51:56.962453Z  INFO db: Closed a RocksDB instance. num_instances=0

Node upgrade history

time: 2024/09/24 08/28, version: 0.28.2-1.40.0

DB reset history

The node service has been running normally, yesterday suddenly reported the above error, and then restored by downloading the snapshot, first download the header information, then someone reported this error
@YuXiaoCoder
Copy link
Author

Using the same snapshot, the node version was upgraded to 0.28.2-2.3.0-rc.4, and it still failed to synchronise the blocks, with the following logs

2024-10-31T03:22:22.623075Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=176962720 total_msg_received_count=4232
2024-10-31T03:22:24.025494Z  INFO network: Closing connection to Some(ed25519:[email protected]:24567) err=Recv(IO(Kind(UnexpectedEof)))
2024-10-31T03:23:22.623602Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=180610320 total_msg_received_count=4217
2024-10-31T03:23:38.850228Z  INFO network: Closing connection to Some(ed25519:[email protected]:24567) err=Recv(IO(Kind(UnexpectedEof)))
2024-10-31T03:24:06.567739Z  INFO network: Closing connection to Some(ed25519:[email protected]:24567) err=Recv(IO(Kind(UnexpectedEof)))
2024-10-31T03:24:22.624763Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=197019756 total_msg_received_count=4456
2024-10-31T03:24:24.026959Z  INFO network: Closing connection to Some(ed25519:[email protected]:24567) err=Recv(IO(Kind(UnexpectedEof)))
2024-10-31T03:24:56.312199Z  INFO network: Closing connection to Some(ed25519:[email protected]:24567) err=Recv(IO(Kind(UnexpectedEof)))
2024-10-31T03:25:22.625841Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=213825630 total_msg_received_count=4500
2024-10-31T03:25:22.931576Z  INFO network: Failed to connect to ed25519:[email protected]:24567 err="PeerActor::spawn(): outbound not allowed: already connected to this peer"
2024-10-31T03:25:32.931503Z  INFO network: Failed to connect to ed25519:[email protected]:24567 err="PeerActor::spawn(): outbound not allowed: already connected to this peer"
2024-10-31T03:25:42.920868Z  INFO network: Failed to connect to ed25519:[email protected]:24567 err="PeerActor::spawn(): outbound not allowed: already connected to this peer"
2024-10-31T03:25:52.916033Z  INFO network: Failed to connect to ed25519:[email protected]:24567 err="PeerActor::spawn(): outbound not allowed: already connected to this peer"
2024-10-31T03:26:02.950219Z  INFO network: Failed to connect to ed25519:[email protected]:24567 err="PeerActor::spawn(): outbound not allowed: already connected to this peer"
2024-10-31T03:26:12.926831Z  INFO network: Failed to connect to ed25519:[email protected]:24567 err="PeerActor::spawn(): outbound not allowed: already connected to this peer"
2024-10-31T03:26:22.626399Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=249099237 total_msg_received_count=5924
2024-10-31T03:27:22.628350Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=227431702 total_msg_received_count=5631
2024-10-31T03:27:32.473537Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:27:37.743297Z  INFO network: Closing connection to Some(ed25519:[email protected]:24567) err=Recv(IO(Kind(UnexpectedEof)))
2024-10-31T03:27:42.467066Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:27:52.467338Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:28:02.467050Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:28:12.467973Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:28:22.468042Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:28:22.636529Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=222080917 total_msg_received_count=5531
2024-10-31T03:28:32.467711Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:28:42.467210Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:28:52.467504Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.
2024-10-31T03:29:02.467299Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: H8wEeuPiUNektcpJ4NDhB38Aarq6oVzcPBtb4Ab2h6ZS.

@telezhnaya
Copy link
Contributor

Hey @YuXiaoCoder ,
Could you please download fresh snapshot https://near-nodes.io/validator/compile-and-run-a-node#get-data-backup-1 and use the latest mainnet release? https://github.com/near/nearcore/releases/tag/2.3.0

@YuXiaoCoder
Copy link
Author

Hey @YuXiaoCoder , Could you please download fresh snapshot https://near-nodes.io/validator/compile-and-run-a-node#get-data-backup-1 and use the latest mainnet release? https://github.com/near/nearcore/releases/tag/2.3.0

@telezhnaya Can snapshots be synchronised incrementally?

@telezhnaya
Copy link
Contributor

Could you please elaborate on this? I don't get the question

@xinzhongyoumeng
Copy link

Hey @YuXiaoCoder , Could you please download fresh snapshot https://near-nodes.io/validator/compile-and-run-a-node#get-data-backup-1 and use the latest mainnet release? https://github.com/near/nearcore/releases/tag/2.3.0

hi, @telezhnaya , We meet the same problem, and tried this method(we used the snapshot which is 2024.11.02), the problem is still.

@xinzhongyoumeng
Copy link

the log is below:

2024-11-04T12:03:18.698835Z  INFO runtime: Error when getting the gc stop height. This error may naturally occur after the gc_num_epochs_to_keep config is increased. It should disappear as soon as the node builds up all epochs it wants. Error: DB Not Found Error: epoch block: FqFQz1LfV69NAtESdbd1en1ZFtRsCiFuJyEdWvw8BVpm
2024-11-04T12:03:18.700531Z  INFO memtrie: Loading tries to memory for shards []...
2024-11-04T12:03:18.701070Z  INFO memtrie: Memtries loading complete for shards []
2024-11-04T12:03:18.701079Z  INFO chain: Init: header head @ #131789999 Eu5TryA3nFg9rbJqR1c9BagebWKLFhW6cRdzFStt3BEA; block head @ #131665276 FqFQz1LfV69NAtESdbd1en1ZFtRsCiFuJyEdWvw8BVpm
thread 'main' panicked at /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/cf41c79/chain/client/src/client_actor.rs:167:6:
called `Result::unwrap()` on an `Err` value: Chain(DBNotFoundErr("epoch block: FqFQz1LfV69NAtESdbd1en1ZFtRsCiFuJyEdWvw8BVpm"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'actix-rt|system:0|arbiter:0' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
thread 'actix-rt|system:0|arbiter:1' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.

@telezhnaya
Copy link
Contributor

@xinzhongyoumeng mainnet or testnet?
What neard version do you use?
Could you please provide the config as well?

@xinzhongyoumeng
Copy link

@xinzhongyoumeng mainnet or testnet? What neard version do you use? Could you please provide the config as well?

mainnet
the docker image is : nearaurora/srpc2-refiner:0.28.2-2.3.0
the binary is: /usr/local/bin/aurora-refiner
the command is below(the 131538598 is the /mnt/auroramain/node/refiner/.REFINER_LAST_BLOCK content.):

refiner -c /mnt/auroramain/conf/refiner/refiner.json run --height=131538598

the config is below:

{
    "refiner": {
        "chain_id": 1313161554,
        "engine_path": "/mnt/auroramain/node/engine",
        "tx_tracker_path": "/mnt/auroramain/node/engine/tx_tracker",
        "engine_account_id": "aurora"
    },
    "input_mode": {
        "Nearcore": {
            "path": "/mnt/auroramain/node/near"
        }
    },
    "output_storage": {
        "path": "/mnt/auroramain/node/refiner",
        "batch_size": 1000
    },
    "socket_server": {
        "path": "/mnt/auroramain/node/refiner.sock"
    }
}

@YuXiaoCoder
Copy link
Author

YuXiaoCoder commented Nov 5, 2024

hi, @telezhnaya , We meet the same problem, and tried this method(we used the snapshot which is 2024.11.02), the problem is still.
neard version: v2.3.0
neard network: mainnet
neard config:

{
    "genesis_file": "/mnt/auroramain/node/near/genesis.json",
    "genesis_records_file": null,
    "node_key_file": "/mnt/auroramain/node/near/node_key.json",
    "rpc": {
        "addr": "0.0.0.0:3030",
        "prometheus_addr": null,
        "cors_allowed_origins": [
            "*"
        ],
        "polling_config": {
            "polling_interval": {
                "secs": 0,
                "nanos": 500000000
            },
            "polling_timeout": {
                "secs": 10,
                "nanos": 0
            }
        },
        "limits_config": {
            "json_payload_max_size": 10485760
        },
        "enable_debug_rpc": false,
        "experimental_debug_pages_src_path": null
    },
    "network": {
        "addr": "0.0.0.0:24567",
        "boot_nodes": "ed25519:[email protected]:24567,ed25519:[email protected]:24567,ed25519:[email protected]:24567,ed25519:[email protected]:24567,ed25519:[email protected]:24567",
        "whitelist_nodes": "",
        "max_num_peers": 40,
        "minimum_outbound_peers": 5,
        "ideal_connections_lo": 30,
        "ideal_connections_hi": 35,
        "so_recv_buffer_size": 1000000,
        "so_send_buffer_size": 1000000,
        "peer_recent_time_window": {
            "secs": 600,
            "nanos": 0
        },
        "safe_set_size": 20,
        "archival_peer_connections_lower_bound": 10,
        "handshake_timeout": {
            "secs": 20,
            "nanos": 0
        },
        "skip_sync_wait": false,
        "ban_window": {
            "secs": 10800,
            "nanos": 0
        },
        "blacklist": [],
        "ttl_account_id_router": {
            "secs": 3600,
            "nanos": 0
        },
        "peer_stats_period": {
            "secs": 5,
            "nanos": 0
        },
        "monitor_peers_max_period": {
            "secs": 60,
            "nanos": 0
        },
        "peer_states_cache_size": 1000,
        "snapshot_hosts_cache_size": 1000,
        "peer_expiration_duration": {
            "secs": 604800,
            "nanos": 0
        },
        "public_addrs": [],
        "allow_private_ip_in_public_addrs": false,
        "trusted_stun_servers": [
            "stun.l.google.com:19302",
            "stun1.l.google.com:19302",
            "stun2.l.google.com:19302",
            "stun3.l.google.com:19302",
            "stun4.l.google.com:19302"
        ],
        "experimental": {
            "inbound_disabled": false,
            "connect_only_to_boot_nodes": false,
            "skip_sending_tombstones_seconds": 0,
            "tier1_enable_inbound": true,
            "tier1_enable_outbound": true,
            "tier1_connect_interval": {
                "secs": 60,
                "nanos": 0
            },
            "tier1_new_connections_per_attempt": 50,
            "network_config_overrides": {
              "connect_to_reliable_peers_on_startup": null,
              "max_send_peers": null,
              "routed_message_ttl": null,
              "max_routes_to_store": null,
              "highest_peer_horizon": null,
              "push_info_period_millis": null,
              "outbound_disabled": null,
              "accounts_data_broadcast_rate_limit_burst": null,
              "accounts_data_broadcast_rate_limit_qps": null,
              "routing_table_update_rate_limit_burst": null,
              "routing_table_update_rate_limit_qps": null,
              "received_messages_rate_limits": null
            }
        }
    },
    "consensus": {
        "min_num_peers": 3,
        "block_production_tracking_delay": {
            "secs": 0,
            "nanos": 100000000
        },
        "min_block_production_delay": {
            "secs": 1,
            "nanos": 300000000
        },
        "max_block_production_delay": {
            "secs": 3,
            "nanos": 0
        },
        "max_block_wait_delay": {
            "secs": 6,
            "nanos": 0
        },
        "produce_empty_blocks": true,
        "block_fetch_horizon": 50,
        "block_header_fetch_horizon": 50,
        "catchup_step_period": {
            "secs": 0,
            "nanos": 100000000
        },
        "chunk_request_retry_period": {
            "secs": 0,
            "nanos": 400000000
        },
        "header_sync_initial_timeout": {
            "secs": 10,
            "nanos": 0
        },
        "header_sync_progress_timeout": {
            "secs": 2,
            "nanos": 0
        },
        "header_sync_stall_ban_timeout": {
            "secs": 120,
            "nanos": 0
        },
        "state_sync_timeout": {
            "secs": 60,
            "nanos": 0
        },
        "header_sync_expected_height_per_second": 10,
        "sync_check_period": {
            "secs": 10,
            "nanos": 0
        },
        "sync_step_period": {
            "secs": 0,
            "nanos": 10000000
        },
        "doomslug_step_period": {
            "secs": 0,
            "nanos": 100000000
        },
        "sync_height_threshold": 1,
        "sync_max_block_requests": 10
    },
    "tracked_accounts": [],
    "tracked_shadow_validator": null,
    "tracked_shards": [
        0
      ],
    "archive": false,
    "log_summary_style": "colored",
    "log_summary_period": {
        "secs": 10,
        "nanos": 0
    },
    "enable_multiline_logging": false,
    "gc_blocks_limit": 2,
    "gc_fork_clean_step": 100,
    "gc_num_epochs_to_keep": 5,
    "gc_step_period": {
        "secs": 1,
        "nanos": 0
      },
    "view_client_threads": 4,
    "epoch_sync_enabled": true,
    "view_client_throttle_period": {
        "secs": 30,
        "nanos": 0
    },
    "trie_viewer_state_size_limit": 50000,
    "store": {
        "path": null,
        "enable_statistics": false,
        "enable_statistics_export": true,
        "max_open_files": 10000,
        "col_state_cache_size": 3221225472,
        "col_flat_state_cache_size": 134217728,
        "block_size": 16384,
        "trie_cache": {
            "default_max_bytes": 500000000,
            "per_shard_max_bytes": {
              "s3.v3": 1500000000,
              "s5.v3": 3000000000,
              "s1.v2": 50000000,
              "s1.v1": 50000000,
              "s3.v1": 3000000000,
              "s4.v2": 3000000000,
              "s1.v3": 50000000,
              "s2.v2": 3000000000,
              "s2.v3": 1500000000
            },
            "shard_cache_deletions_queue_capacity": 100000
          },
        "view_trie_cache": {
            "default_max_bytes": 50000000,
            "per_shard_max_bytes": {},
            "shard_cache_deletions_queue_capacity": 100000
        },
        "enable_receipt_prefetching": true,
        "sweat_prefetch_receivers": [
            "token.sweat",
            "vfinal.token.sweat.testnet"
        ],
        "sweat_prefetch_senders": [
            "oracle.sweat",
            "sweat_the_oracle.testnet"
        ],
        "claim_sweat_prefetch_config": [
            {
              "receiver": "claim.sweat",
              "sender": "token.sweat",
              "method_name": "record_batch_for_hold"
            },
            {
              "receiver": "claim.sweat",
              "sender": "",
              "method_name": "claim"
            }
        ],
        "kaiching_prefetch_config": [
          {
            "receiver": "earn.kaiching",
            "sender": "wallet.kaiching",
            "method_name": "ft_on_transfer"
          }
        ],
        "load_mem_tries_for_shards": [],
        "load_mem_tries_for_tracked_shards": false,
        "state_snapshot_config": {
          "state_snapshot_type": "ForReshardingOnly"
        },
        "state_snapshot_enabled": false
      },
      "state_sync_enabled": true,
      "state_sync": {
        "sync": {
          "ExternalStorage": {
            "location": {
              "GCS": {
                "bucket": "state-parts"
              }
            },
            "num_concurrent_requests": 25,
            "num_concurrent_requests_during_catchup": 5,
            "external_storage_fallback_threshold": 5
          }
        }
      },
      "epoch_sync": {
        "enabled": false,
        "epoch_sync_horizon": 216000,
        "epoch_sync_accept_proof_max_horizon": 86400,
        "timeout_for_epoch_sync": {
          "secs": 60,
          "nanos": 0
        }
      },
      "transaction_pool_size_limit": 100000000,
      "resharding_config": {
        "batch_size": 500000,
        "batch_delay": {
          "secs": 0,
          "nanos": 100000000
        },
        "retry_delay": {
          "secs": 10,
          "nanos": 0
        },
        "initial_delay": {
          "secs": 0,
          "nanos": 0
        },
        "max_poll_time": {
          "secs": 7200,
          "nanos": 0
        }
      },
      "tx_routing_height_horizon": 4,
      "produce_chunk_add_transactions_time_limit": null,
      "orphan_state_witness_pool_size": 25,
      "orphan_state_witness_max_size": 40000000,
      "max_loaded_contracts": 256,
      "save_latest_witnesses": false
}

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Nov 5, 2024

Hi, @YuXiaoCoder! I am not familiar with Aurora's refiner tool.

Your config.json looks ok. Can you try to run just the neard binary on a claen snapshot to see if you encounter the same issue?
neard --home /mnt/auroramain/node/near run

@xinzhongyoumeng
Copy link

xinzhongyoumeng commented Nov 6, 2024

Hi, @YuXiaoCoder! I am not familiar with Aurora's refiner tool.

Your config.json looks ok. Can you try to run just the neard binary on a claen snapshot to see if you encounter the same issue? neard --home /mnt/auroramain/node/near run

neard version :2.3.0
snapshot: 2024.11.05
the logs :

2024-11-05T23:26:59.142986Z  INFO stats: #131974464 Downloading headers 93.79% (4713 left; at 132045670) 34 peers ⬇ 5.41 MB/s ⬆ 3.46 MB/s 0.00 bps 0 gas/s CPU: 108%, Mem: 1.04 GB
2024-11-05T23:27:09.571413Z  INFO stats: #131974464 Downloading headers 95.80% (3187 left; at 132047206) 34 peers ⬇ 5.70 MB/s ⬆ 4.33 MB/s 0.00 bps 0 gas/s CPU: 95%, Mem: 1.05 GB
2024-11-05T23:27:21.810205Z  INFO stats: #131974464 Downloading headers 97.14% (2174 left; at 132048230) 35 peers ⬇ 5.46 MB/s ⬆ 4.04 MB/s 0.00 bps 0 gas/s CPU: 67%, Mem: 1.08 GB
2024-11-05T23:27:32.865827Z  INFO stats: #131974464 Downloading headers 99.15% (647 left; at 132049767) 34 peers ⬇ 5.25 MB/s ⬆ 3.83 MB/s 0.00 bps 0 gas/s CPU: 105%, Mem: 1.03 GB



2024-11-06T00:26:07.734895Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:26:17.728533Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:26:27.728671Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:26:37.728751Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:26:47.728756Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:26:57.729266Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:27:07.728611Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:27:17.729038Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:27:27.728467Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:27:37.729197Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:27:47.728851Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:27:57.728623Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:28:07.728601Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:28:17.728812Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:28:27.729370Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:28:37.729283Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:28:47.728903Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:28:57.728478Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:29:07.729015Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:29:17.728859Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:29:27.728868Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:29:37.729073Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:29:47.728838Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:29:57.728497Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:30:07.728894Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:30:17.728598Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:56:50.797743Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
2024-11-06T00:57:00.797900Z ERROR metrics: Error when exporting postponed receipts count DB Not Found Error: BLOCK: 84mQ5Fi8Lbyu3pA3Wp1t2rpDwswcwqAvaRAruF1nLU6K.
/opt/nearmain/supervisor/node_command.sh: line 51: 26719 Killed                  ${COMMAND}
[2024-11-06 08:57:09] [node_command.sh] exec command [/opt/nearmain/core/neard --home=/mnt/nearmain/node run]
2024-11-06T00:57:09.607258Z  INFO neard: version="2.3.0" build="unknown" latest_protocol=72
2024-11-06T00:57:09.617546Z  INFO config: Validating Config, extracted from config.json...
2024-11-06T00:57:09.624459Z  WARN genesis: Skipped genesis validation
2024-11-06T00:57:09.624483Z  INFO config: Validating Genesis config and records. This could take a few minutes...
2024-11-06T00:57:09.626461Z  INFO config: All validations have passed!
2024-11-06T00:57:09.640040Z  INFO neard: Reset the config "/mnt/nearmain/node/log_config.json" because the config file doesn't exist. err=Os { code: 2, kind: NotFound, message: "No such file or directory" }
2024-11-06T00:57:09.641397Z  INFO config: Validating Config, extracted from config.json...
2024-11-06T00:57:09.641411Z  INFO neard: No validator key /mnt/nearmain/node/validator_key.json.
2024-11-06T00:57:09.641455Z  INFO near_o11y::reload: Updated the logging layer according to `log_config.json`
2024-11-06T00:57:09.641491Z  INFO db_opener: Opening NodeStorage path="/mnt/nearmain/node/data" cold_path="none"
2024-11-06T00:57:09.642518Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:13.478106Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-11-06T00:57:13.478136Z  INFO db_opener: The database exists. path=/mnt/nearmain/node/data
2024-11-06T00:57:13.478694Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:16.628666Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-11-06T00:57:16.628721Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:16.661050Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-11-06T00:57:16.661076Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:16.926395Z  INFO db_opener: Opening NodeStorage path="/mnt/nearmain/node/data/state_snapshot/9ncvbuieh6E6aCpECMp39LxDGdLdvJWSFELLVFEGsMdt/data" cold_path="none"
2024-11-06T00:57:16.927965Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:16.937217Z  INFO db: Closed a RocksDB instance. num_instances=1
2024-11-06T00:57:16.937250Z  INFO db_opener: The database exists. path=/mnt/nearmain/node/data/state_snapshot/9ncvbuieh6E6aCpECMp39LxDGdLdvJWSFELLVFEGsMdt/data
2024-11-06T00:57:16.937259Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:16.982848Z  INFO db: Closed a RocksDB instance. num_instances=1
2024-11-06T00:57:16.982878Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:16.985647Z  INFO db: Closed a RocksDB instance. num_instances=1
2024-11-06T00:57:16.985660Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:17.000165Z  INFO db: Closed a RocksDB instance. num_instances=1
thread 'main' panicked at chain/client/src/client_actor.rs:167:6:
called `Result::unwrap()` on an `Err` value: Chain(DBNotFoundErr("epoch block: 6QgMx3dsDmbU8R85wf7ZWTLpEAd8Gw1ssAEhQqcVxa7T"))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: nearcore::start_with_config_and_synchronization
   4: neard::cli::RunCmd::run::{{closure}}
   5: tokio::task::local::LocalSet::run_until::{{closure}}
   6: neard::cli::NeardCmd::parse_and_run
   7: neard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
/opt/nearmain/supervisor/node_command.sh: line 51: 30740 Aborted                 (core dumped) ${COMMAND}
[2024-11-06 08:57:18] [node_command.sh] exec command [/opt/nearmain/core/neard --home=/mnt/nearmain/node run]
2024-11-06T00:57:18.117380Z  INFO neard: version="2.3.0" build="unknown" latest_protocol=72
2024-11-06T00:57:18.117668Z  INFO config: Validating Config, extracted from config.json...
2024-11-06T00:57:18.119631Z  WARN genesis: Skipped genesis validation
2024-11-06T00:57:18.119649Z  INFO config: Validating Genesis config and records. This could take a few minutes...
2024-11-06T00:57:18.119850Z  INFO config: All validations have passed!
2024-11-06T00:57:18.121326Z  INFO neard: Reset the config "/mnt/nearmain/node/log_config.json" because the config file doesn't exist. err=Os { code: 2, kind: NotFound, message: "No such file or directory" }
2024-11-06T00:57:18.121490Z  INFO config: Validating Config, extracted from config.json...
2024-11-06T00:57:18.121503Z  INFO neard: No validator key /mnt/nearmain/node/validator_key.json.
2024-11-06T00:57:18.121545Z  INFO near_o11y::reload: Updated the logging layer according to `log_config.json`
2024-11-06T00:57:18.121566Z  INFO db_opener: Opening NodeStorage path="/mnt/nearmain/node/data" cold_path="none"
2024-11-06T00:57:18.121580Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:18.155142Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-11-06T00:57:18.155166Z  INFO db_opener: The database exists. path=/mnt/nearmain/node/data
2024-11-06T00:57:18.155180Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:18.465798Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-11-06T00:57:18.465845Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:18.500391Z  INFO db: Closed a RocksDB instance. num_instances=0
2024-11-06T00:57:18.500413Z  INFO db: Opened a new RocksDB instance. num_instances=1
2024-11-06T00:57:18.761496Z  INFO db_opener: Opening NodeStorage path="/mnt/nearmain/node/data/state_snapshot/9ncvbuieh6E6aCpECMp39LxDGdLdvJWSFELLVFEGsMdt/data" cold_path="none"
2024-11-06T00:57:18.761798Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:18.768142Z  INFO db: Closed a RocksDB instance. num_instances=1
2024-11-06T00:57:18.768161Z  INFO db_opener: The database exists. path=/mnt/nearmain/node/data/state_snapshot/9ncvbuieh6E6aCpECMp39LxDGdLdvJWSFELLVFEGsMdt/data
2024-11-06T00:57:18.768168Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:18.783895Z  INFO db: Closed a RocksDB instance. num_instances=1
2024-11-06T00:57:18.783966Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:18.787127Z  INFO db: Closed a RocksDB instance. num_instances=1
2024-11-06T00:57:18.787142Z  INFO db: Opened a new RocksDB instance. num_instances=2
2024-11-06T00:57:18.801651Z  INFO db: Closed a RocksDB instance. num_instances=1
thread 'main' panicked at chain/client/src/client_actor.rs:167:6:
called `Result::unwrap()` on an `Err` value: Chain(DBNotFoundErr("epoch block: 6QgMx3dsDmbU8R85wf7ZWTLpEAd8Gw1ssAEhQqcVxa7T"))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: nearcore::start_with_config_and_synchronization
   4: neard::cli::RunCmd::run::{{closure}}
   5: tokio::task::local::LocalSet::run_until::{{closure}}
   6: neard::cli::NeardCmd::parse_and_run
   7: neard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
/opt/nearmain/supervisor/node_command.sh: line 51: 31181 Aborted                 (core dumped) ${COMMAND}
[2024-11-06 08:57:20] [node_command.sh] exec command [/opt/nearmain/core/neard --home=/mnt/nearmain/node run]

@xinzhongyoumeng
Copy link

if the snapshot is bad?

@VanBarbascu
Copy link
Contributor

VanBarbascu commented Nov 7, 2024

Judging by the gap in logs between header sync at 2024-11-05T23:27:32.865827Z and node crashing at 2024-11-06T00:57:00.797900Z, the node seems to do some garbage collection and then state sync.

/opt/nearmain/supervisor/node_command.sh: line 51: 26719 Killed                  ${COMMAND}
[2024-11-06 08:57:09] [node_command.sh] exec command [/opt/nearmain/core/neard --home=/mnt/nearmain/node run]

This log tells me that the process was killed and restarted. This may have corrupted the DB.

Check sudo dmesg at the time of reset and you may find the reason why your process was killed.
It could be oom killer. If so, how much memory do you have available on that host?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Issues created by community investigation required Node Node team
Projects
None yet
Development

No branches or pull requests

4 participants