Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error hook failed: "start" in the middle of deployment #282

Open
nobuto-m opened this issue Aug 14, 2024 · 2 comments
Open

error hook failed: "start" in the middle of deployment #282

nobuto-m opened this issue Aug 14, 2024 · 2 comments

Comments

@nobuto-m
Copy link
Contributor

Bug Description

The charm status gets into error unnecessarily in the middle of deployment.

13 Aug 2024 15:24:36Z juju-unit error hook failed: "start"

$ juju show-status-log alertmanager/0 --days 1
Time                   Type       Status       Message
13 Aug 2024 15:23:38Z  workload   waiting      installing agent
13 Aug 2024 15:23:38Z  juju-unit  allocating   
13 Aug 2024 15:24:07Z  workload   waiting      agent initialising
13 Aug 2024 15:24:11Z  workload   maintenance  installing charm software
13 Aug 2024 15:24:11Z  juju-unit  executing    running install hook
13 Aug 2024 15:24:14Z  juju-unit  executing    running alerting-relation-created hook
13 Aug 2024 15:24:16Z  juju-unit  executing    running catalogue-relation-created hook
13 Aug 2024 15:24:17Z  juju-unit  executing    running grafana-dashboard-relation-created hook
13 Aug 2024 15:24:19Z  juju-unit  executing    running ingress-relation-created hook
13 Aug 2024 15:24:20Z  juju-unit  executing    running replicas-relation-created hook
13 Aug 2024 15:24:21Z  juju-unit  executing    running self-metrics-endpoint-relation-created hook
13 Aug 2024 15:24:22Z  juju-unit  executing    running grafana-source-relation-created hook
13 Aug 2024 15:24:23Z  juju-unit  executing    running leader-elected hook
13 Aug 2024 15:24:25Z  juju-unit  executing    running alertmanager-pebble-ready hook
13 Aug 2024 15:24:26Z  juju-unit  executing    running data-storage-attached hook
13 Aug 2024 15:24:27Z  juju-unit  executing    running config-changed hook
13 Aug 2024 15:24:36Z  juju-unit  error        hook failed: "start"
13 Aug 2024 15:24:41Z  juju-unit  executing    running start hook
13 Aug 2024 15:24:44Z  workload   unknown      
13 Aug 2024 15:24:44Z  juju-unit  executing    running alertmanager-pebble-ready hook
13 Aug 2024 15:24:47Z  juju-unit  executing    running catalogue-relation-joined hook for catalogue/0
13 Aug 2024 15:24:49Z  juju-unit  executing    running alerting-relation-joined hook for loki/0

To Reproduce

  1. juju deploy cos-lite --trust --channel latest/edge

Environment

$ juju status
Model  Controller       Cloud/Region            Version  SLA          Timestamp
cos    maas-controller  cos-microk8s/localhost  3.5.3    unsupported  01:04:15Z

App           Version  Status  Scale  Charm             Channel      Rev  Address         Exposed  Message
alertmanager  0.27.0   active      1  alertmanager-k8s  latest/edge  129  10.152.183.171  no       
catalogue              active      1  catalogue-k8s     latest/edge   59  10.152.183.188  no       
grafana       9.5.3    active      1  grafana-k8s       latest/edge  118  10.152.183.65   no       
loki          2.9.6    active      1  loki-k8s          latest/edge  163  10.152.183.227  no       
prometheus    2.52.0   active      1  prometheus-k8s    latest/edge  210  10.152.183.87   no       
traefik       2.11.0   active      1  traefik-k8s       latest/edge  203  10.152.183.222  no       Serving at 192.168.151.81

Relevant log output

unit-alertmanager-0: 15:24:29 DEBUG unit.alertmanager/0.juju-log receive_response_body.complete
unit-alertmanager-0: 15:24:29 DEBUG unit.alertmanager/0.juju-log response_closed.started
unit-alertmanager-0: 15:24:29 DEBUG unit.alertmanager/0.juju-log response_closed.complete
unit-alertmanager-0: 15:24:29 INFO juju.worker.caasunitterminationworker terminating due to SIGTERM
unit-alertmanager-0: 15:24:29 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-alertmanager-0: 15:24:29 INFO juju.worker.uniter unit "alertmanager/0" shutting down: agent should be terminated
unit-alertmanager-0: 15:24:32 INFO juju.cmd running containerAgent [3.5.3 63d460f9ee6c7c710131961390687e7a0ab90470 gc go1.21.12]
unit-alertmanager-0: 15:24:32 INFO juju.cmd.containeragent.unit start "unit"
unit-alertmanager-0: 15:24:32 INFO juju.worker.upgradesteps upgrade steps for 3.5.3 have already been run.
unit-alertmanager-0: 15:24:32 INFO juju.worker.probehttpserver starting http server on 127.0.0.1:65301
unit-alertmanager-0: 15:24:32 INFO juju.api connection established to "wss://192.168.151.101:17070/model/e1604679-6045-410d-8aab-53bf0f2ea605/api"
unit-alertmanager-0: 15:24:32 INFO juju.worker.apicaller [e16046] "unit-alertmanager-0" successfully connected to "192.168.151.101:17070"
unit-alertmanager-0: 15:24:32 INFO juju.api connection established to "wss://192.168.151.101:17070/model/e1604679-6045-410d-8aab-53bf0f2ea605/api"
unit-alertmanager-0: 15:24:32 INFO juju.worker.apicaller [e16046] "unit-alertmanager-0" successfully connected to "192.168.151.101:17070"
unit-alertmanager-0: 15:24:32 INFO juju.worker.migrationminion migration migration phase is now: NONE
unit-alertmanager-0: 15:24:32 INFO juju.worker.logger logger worker started
unit-alertmanager-0: 15:24:32 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
unit-alertmanager-0: 15:24:32 INFO juju.worker.leadership alertmanager/0 promoted to leadership of alertmanager
unit-alertmanager-0: 15:24:32 INFO juju.worker.caasupgrader abort check blocked until version event received
unit-alertmanager-0: 15:24:32 INFO juju.worker.caasupgrader unblocking abort check
unit-alertmanager-0: 15:24:32 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-alertmanager-0
unit-alertmanager-0: 15:24:32 INFO juju.worker.uniter unit "alertmanager/0" started
unit-alertmanager-0: 15:24:32 INFO juju.worker.uniter hooks are retried true
unit-alertmanager-0: 15:24:33 INFO juju.worker.uniter.charm downloading ch:amd64/focal/alertmanager-k8s-129 from API server
unit-alertmanager-0: 15:24:33 INFO juju.downloader downloading from ch:amd64/focal/alertmanager-k8s-129
unit-alertmanager-0: 15:24:33 INFO juju.downloader download complete ("ch:amd64/focal/alertmanager-k8s-129")
unit-alertmanager-0: 15:24:33 INFO juju.downloader download verified ("ch:amd64/focal/alertmanager-k8s-129")
unit-alertmanager-0: 15:24:36 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-alertmanager-0: 15:24:37 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-alertmanager-0: 15:24:41 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-alertmanager-0: 15:24:43 DEBUG unit.alertmanager/0.juju-log ops 2.15.0 up and running.

Additional context

cos-lite_debug.log

@lucabello
Copy link
Contributor

We should probably implement a guard to make sure the Kubernetes resource patch is ready before we try to do anything, like it was done in this PR.

@Abuelodelanada
Copy link
Contributor

Hey @nobuto-m

I've deployed the same revisions you had:

Model  Controller  Cloud/Region        Version  SLA          Timestamp
cos    microk8s    microk8s/localhost  3.5.4    unsupported  17:44:01-03:00

App           Version  Status  Scale  Charm             Channel      Rev  Address         Exposed  Message
alertmanager  0.27.0   active      1  alertmanager-k8s  latest/edge  129  10.152.183.108  no       
catalogue              active      1  catalogue-k8s     latest/edge   58  10.152.183.135  no       
grafana       9.5.3    active      1  grafana-k8s       latest/edge  118  10.152.183.21   no       
loki          2.9.6    active      1  loki-k8s          latest/edge  163  10.152.183.84   no       
prometheus    2.52.0   active      1  prometheus-k8s    latest/edge  210  10.152.183.177  no       
traefik       2.11.0   active      1  traefik-k8s       latest/edge  203  10.152.183.222  no       Serving at 192.168.1.240

Unit             Workload  Agent  Address     Ports  Message
alertmanager/0*  active    idle   10.1.9.222         
catalogue/0*     active    idle   10.1.9.210         
grafana/0*       active    idle   10.1.9.230         
loki/0*          active    idle   10.1.9.235         
prometheus/0*    active    idle   10.1.9.220         
traefik/0*       active    idle   10.1.9.226         Serving at 192.168.1.240

And I wasn't able to reproduce the issue:

juju show-status-log alertmanager/0  --days 1
Time                        Type       Status       Message
18 Sep 2024 17:15:59-03:00  workload   waiting      installing agent
18 Sep 2024 17:15:59-03:00  juju-unit  allocating   
18 Sep 2024 17:16:15-03:00  workload   waiting      agent initialising
18 Sep 2024 17:16:42-03:00  workload   maintenance  installing charm software
18 Sep 2024 17:16:42-03:00  juju-unit  executing    running install hook
18 Sep 2024 17:16:45-03:00  juju-unit  executing    running alerting-relation-created hook
18 Sep 2024 17:16:47-03:00  juju-unit  executing    running catalogue-relation-created hook
18 Sep 2024 17:16:48-03:00  juju-unit  executing    running grafana-dashboard-relation-created hook
18 Sep 2024 17:16:50-03:00  juju-unit  executing    running grafana-source-relation-created hook
18 Sep 2024 17:16:51-03:00  juju-unit  executing    running ingress-relation-created hook
18 Sep 2024 17:16:52-03:00  juju-unit  executing    running replicas-relation-created hook
18 Sep 2024 17:16:53-03:00  juju-unit  executing    running self-metrics-endpoint-relation-created hook
18 Sep 2024 17:16:54-03:00  juju-unit  executing    running leader-elected hook
18 Sep 2024 17:16:55-03:00  juju-unit  executing    running alertmanager-pebble-ready hook
18 Sep 2024 17:16:56-03:00  juju-unit  executing    running data-storage-attached hook
18 Sep 2024 17:17:42-03:00  juju-unit  executing    running config-changed hook
18 Sep 2024 17:17:53-03:00  juju-unit  executing    running start hook
18 Sep 2024 17:17:55-03:00  juju-unit  executing    running alertmanager-pebble-ready hook
18 Sep 2024 17:17:58-03:00  juju-unit  executing    running ingress-relation-joined hook for traefik/0
18 Sep 2024 17:17:59-03:00  juju-unit  executing    running alerting-relation-joined hook for prometheus/0
18 Sep 2024 17:18:01-03:00  juju-unit  executing    running ingress-relation-changed hook for traefik/0
18 Sep 2024 17:18:02-03:00  juju-unit  executing    running catalogue-relation-joined hook for catalogue/0
18 Sep 2024 17:18:04-03:00  juju-unit  executing    running grafana-dashboard-relation-joined hook for grafana/0
18 Sep 2024 17:18:04-03:00  juju-unit  idle         
18 Sep 2024 17:18:05-03:00  juju-unit  executing    running catalogue-relation-changed hook for catalogue/0
18 Sep 2024 17:18:06-03:00  juju-unit  executing    running grafana-dashboard-relation-changed hook for grafana/0
18 Sep 2024 17:18:07-03:00  juju-unit  idle         
18 Sep 2024 17:18:07-03:00  juju-unit  executing    running self-metrics-endpoint-relation-joined hook for prometheus/0
18 Sep 2024 17:18:08-03:00  juju-unit  executing    running grafana-source-relation-joined hook for grafana/0
18 Sep 2024 17:18:09-03:00  juju-unit  executing    running grafana-source-relation-changed hook for grafana/0
18 Sep 2024 17:18:10-03:00  juju-unit  executing    running alerting-relation-changed hook for prometheus/0
18 Sep 2024 17:18:11-03:00  juju-unit  executing    running alerting-relation-joined hook for loki/0
18 Sep 2024 17:18:12-03:00  juju-unit  executing    running self-metrics-endpoint-relation-changed hook for prometheus/0
18 Sep 2024 17:18:13-03:00  juju-unit  executing    running ingress-relation-changed hook
18 Sep 2024 17:18:16-03:00  workload   active       
18 Sep 2024 17:18:17-03:00  juju-unit  executing    running alerting-relation-changed hook for loki/0
18 Sep 2024 17:18:18-03:00  juju-unit  idle         

I believe that it's a timing issue... How much RAM and CPU does the machine have? 🤔

Please, may you try deploying now from edge and tell me if you see the same issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants