Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: CML Runner Registration #126

Closed
leonardcser opened this issue Aug 3, 2023 · 3 comments
Closed

bug: CML Runner Registration #126

leonardcser opened this issue Aug 3, 2023 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@leonardcser
Copy link
Contributor

leonardcser commented Aug 3, 2023

For the chapter 15, CML successfully creates the runner on GCP, however it hangs on the setup-runner step of the workflow.

Behaviour

  1. The cicd starts on GitHub
  2. CML creates the runner on GCP
  3. The setup-runner step hangs on Terraform waiting:
    level":"info","message":"iterative_cml_runner.runner: Still creating...
  4. After 5-7mins, the GCP pod auto-terminates
  5. The GitHub workflow is still hanging with Terraform at the setup-runner step

Below is the output of the runner pod:

> kubectl logs -f cml-bo4s2uhzqs-2qx6z08y-ig1rgwq0-lg67g

Failed to get unit file state for cml.service: No such file or directory
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 84.5M  100 84.5M    0     0  28.4M      0  0:00:02  0:00:02 --:--:-- 37.8M
bash: line 24: lsof: command not found
{"level":"info","message":"POST /repos/leonardcser/mlops-test/actions/runners/registration-token - 201 in 275ms"}
{"level":"info","message":"GET /repos/leonardcser/mlops-test/actions/runners?per_page=100 - 200 in 215ms"}
{"level":"warn","message":"Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically."}
{"level":"info","message":"Preparing workdir /home/runner..."}
{"level":"info","message":"Launching github runner"}
{"level":"info","message":"Terraform 1.5.4"}
{"level":"info","message":"Plan: 0 to add, 0 to change, 0 to destroy."}
{"level":"info","message":"Apply complete! Resources: 0 added, 0 changed, 0 destroyed."}
{"level":"info","message":"Outputs: 0"}
{"level":"warn","message":"Error connecting to ACPI socket: connect ENOENT /var/run/acpid.socket. The acpid.service helps with instance termination detection."}
{"level":"info","message":"POST /repos/leonardcser/mlops-test/actions/runners/registration-token - 201 in 317ms"}
{"date":"2023-08-03T09:15:06.304Z","level":"info","message":"runner status","repo":"https://github.com/leonardcser/mlops-test","status":"ready"}
{"level":"info","message":"Unregistering runner cml-bo4s2uhzqs-2qx6z08y-ig1rgwq0..."}
{"level":"info","message":"GET /repos/leonardcser/mlops-test/actions/runners?per_page=100 - 200 in 277ms"}
{"level":"info","message":"DELETE /repos/leonardcser/mlops-test/actions/runners/23 - 204 in 360ms"}
{"level":"info","message":"\tSuccess"}
{"level":"info","message":"Waiting 10 seconds to destroy"}

This output is similar to this issue on CML: iterative/cml#1332

@leonardcser leonardcser added bug Something isn't working help wanted Extra attention is needed labels Aug 3, 2023
@ludelafo ludelafo linked a pull request Aug 15, 2023 that will close this issue
@ludelafo
Copy link
Contributor

I can confirm having the same issue on my side. I don't have a clue why it doesn't work anymore but I'll let you know when I've found something.

@ludelafo ludelafo removed a link to a pull request Aug 21, 2023
@ludelafo ludelafo linked a pull request Aug 21, 2023 that will close this issue
@ludelafo
Copy link
Contributor

@rmarquis, @leonardcser, I have added a new comment to the CML issue I have opened last year regarding this issue that you can find here: iterative/cml#1415 (comment).

@rmarquis
Copy link
Contributor

We're moving away from CML as a k8s registration tool, due to the aformentionned issue. CML development is also seemingly in a maintenance mode, without much activity anymore.

We'll keep using CML for reporting in github comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants