Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First node for CCF cluster failed to come up in CACI #6465

Closed
gaurav137 opened this issue Sep 5, 2024 · 4 comments · Fixed by #6478
Closed

First node for CCF cluster failed to come up in CACI #6465

gaurav137 opened this issue Sep 5, 2024 · 4 comments · Fixed by #6478
Assignees

Comments

@gaurav137
Copy link

gaurav137 commented Sep 5, 2024

Hit this issue on CACI where the start node itself has not come up. Looking at the container logs I see:
[fail ] node/quote_endorsements_client.h:136 | Giving up retrying fetching attestation endorsements from 10.92.0.12 after 3 attempts.

The node/state endpoint reports state as Initialized and hasn't transitioned to PartOfNetwork.

Filing the bug to track this issue.

2024-09-04T16:32:05.763122Z        0   [info ] node/quote_endorsements_client.h:104 | Fetching endorsements for attestation report at http://10.92.0.12:2377/metadata/THIM/amd/cert
ification?platformId=6c9871d8cd30ddb602591f746b7971db7e4457ee477be925e44c68753178492c48447f8c9cdee4f61fdc5907d4767c7903ad7490aed302a7a0c7584d13c17375&tcbVersion=d315000000000004
 
2024-09-04T16:32:05.770874Z -0.003 0   [info ] node/quote_endorsements_client.h:104 | Fetching endorsements for attestation report at http://10.92.0.12:2377/metadata/THIM/amd/certification?platformId=6c9871d8cd30ddb602591f746b7971db7e4457ee477be925e44c68753178492c48447f8c9cdee4f61fdc5907d4767c7903ad7490aed302a7a0c7584d13c17375&tcbVersion=d315000000000004
 
2024-09-04T16:32:08.778701Z -0.004 0   [info ] node/quote_endorsements_client.h:104 | Fetching endorsements for attestation report at http://10.92.0.12:2377/metadata/THIM/amd/cert
ification?platformId=6c9871d8cd30ddb602591f746b7971db7e4457ee477be925e44c68753178492c48447f8c9cdee4f61fdc5907d4767c7903ad7490aed302a7a0c7584d13c17375&tcbVersion=d315000000000004
 
2024-09-04T16:32:11.778697Z -0.004 0   [fail ] node/quote_endorsements_client.h:136 | Giving up retrying fetching attestation endorsements from 10.92.0.12 after 3 attempts
@gaurav137 gaurav137 added the bug label Sep 5, 2024
@gaurav137
Copy link
Author

gaurav137 commented Sep 5, 2024

Perhaps logic can retry more and/or give a longer gap between the retries. The above logs show very fast retries. To debug this on doing curl command for the same endpoint in the container it worked, so would be an intermittent issue.

achamayou added a commit to achamayou/CCF that referenced this issue Sep 11, 2024
@achamayou achamayou self-assigned this Sep 12, 2024
@achamayou
Copy link
Member

@gaurav137 I think what we want here mainly is:

  1. More retries.
  2. Node shut down if all attempts fail.

Does that make sense to you?

@gaurav137
Copy link
Author

gaurav137 commented Sep 17, 2024

@achamayou yes sounds good. I'd also suggest to have the retry attempts/timeouts configurable via the cchost config file and not baked into the code. That way tweaking them does not require new ccf builds/images.

@achamayou
Copy link
Member

@gaurav137 yes, please see #6478

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants