Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to send update info to requester node error="failed to get nodestate during node registration: nodeInfo not found for nodeID" #4694

Closed
chungyan5 opened this issue Nov 6, 2024 · 2 comments
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected

Comments

@chungyan5
Copy link

Bug Description

I am a newbie, I try to setup one requester and one Compute Node following Create Network, but come out the above error message from Compute Node side.

Expected Behavior

Requester(s) and Compute Node(s) can communicate.

Steps to Reproduce

  1. i have TWO linux ubuntu servers in a same network.
  2. install bacalhau (1.5.1) in both
  3. setup token
  4. run Requester and Compute Node
  5. Requester is running smooth, Compute Node comes out the above error msg.
  6. For your more information: i setup a node as Requester and Compute in same computer, then it works, i can submit the job to run.
  7. So, i may be communication issue between two nodes.

Bacalhau Versions

  • Agent Version: Run bacalhau agent version to get this.
    Bacalhau v1.5.1
    BuildDate 2024-10-28 06:10:18 +0000 UTC
    GitCommit 2c4963f

  • CLI Client Version: Run bacalhau version for the client info.
    CLIENT SERVER LATEST UPDATE MESSAGE
    v1.5.1 v1.5.1 1.5.1

Host Environment

Provide details about the environment where the bug occurred:

  • Operating System:
      1. ubuntu 18.04
      1. ubuntu 24.04
  • CPU Architecture:
      1. Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
      1. Intel(R) Xeon(R) w5-3425
  • Any other relevant environment details:

Job Specification

(If applicable, provide the job spec used when the issue occurred.)

Logs

Requester Logs:

bacalhau serve --orchestrator
00:45:53.786 | INF cmd/cli/serve/serve.go:103 > Config loaded from: [/home/easystore/.bacalhau/config.yaml], and with data-dir /home/easystore/.bacalhau
00:45:53.787 | INF cmd/cli/serve/serve.go:181 > Starting bacalhau...
00:45:54.835 | INF cmd/cli/serve/serve.go:256 > bacalhau node running [address:0.0.0.0:1234] [compute_enabled:false] [name:n-adc8ea97-fcbc-4efb-9ccc-f23040349a7d] [orchestrator_address:0.0.0.0:4222] [orchestrator_enabled:true] [webui_enabled:false]

To connect to this node from the local client, run the following commands in your shell:
export BACALHAU_API_HOST=127.0.0.1
export BACALHAU_API_PORT=1234

A copy of these variables have been written to: /home/easystore/.bacalhau/bacalhau.run
00:54:02.637 | WRN pkg/orchestrator/planner/logging_planner.go:48 > Job failed [Details:{"IsError":"true","NodesAvailable":"0","NodesRequested":"1","NodesSuitable":"0"}] [EvalID:ba18a0d8-cc76-4d71-8ebe-c885652934b5] [Event:"not enough nodes to run job. requested: 1, available: 0, suitable: 0."] [JobID:j-fdaebcd3-2a71-40e1-b9e9-52b601e56ae0] [NodeID:n-adc8ea97]

Compute Node Logs:

bacalhau serve --compute --config Compute.Orchestrators=192.168.1.58
11:03:55.117 | INF cmd/cli/serve/serve.go:103 > Config loaded from: [/home/easystore/.bacalhau/config.yaml], and with data-dir /home/easystore/.bacalhau
11:03:55.117 | INF cmd/cli/serve/serve.go:181 > Starting bacalhau...
11:03:56.157 | INF cmd/cli/serve/serve.go:256 > bacalhau node running [address:0.0.0.0:1234] [capacity:"{CPU: 16.80, Memory: 94 GB, Disk: 671 GB, GPU: 0}"]
[compute_enabled:true] [engines:["docker","wasm"]] [name:n-389eb261-e61b-47cf-91f1-a621e198cd25] [orchestrator_enabled:false] [orchestrators:["192.168.1.58"]] [publishers:["local","noop"]] [storages:["urldownload","inline"]] [webui_enabled:false]

To connect to this node from the local client, run the following commands in your shell:
export BACALHAU_API_HOST=127.0.0.1
export BACALHAU_API_PORT=1234

A copy of these variables have been written to: /home/easystore/.bacalhau/bacalhau.run
11:04:56.146 | ERR pkg/compute/management_client.go:117 > failed to send update info to requester node error="failed to get nodestate during node registration: nodeInfo not found for nodeID: n-389eb261-e61b-47cf-91f1-a621e198cd25" [NodeID:n-389eb261]

@chungyan5 chungyan5 added request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected labels Nov 6, 2024
@chungyan5
Copy link
Author

hi all,

I tried my another network with another computers, it works. Let myself to investigate the 1st network and 1st set of computers issue. Thks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected
Projects
None yet
Development

No branches or pull requests

1 participant