-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
startup and cfg issues with opamp supervisor (last_recv_remote_config.dat, opamp_server_port) #36196
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
".. continuously and never get in the situation to end up in a failed state where agent is no started. 1.) is no startup fail state - it is a bug i would say. " Revisiting this sentence i think now this shall be handled like a "noop" cfg case, means nothing yet received is at the same level like noop received - defaulted to noop on "initial run" -and its a valid "good case" .. to wait until connected An initial state - can be the case when agent id file is not existing which shall fallback to noop |
If i terminate the process and start the supervisor again it seems to clean up(effective cfg yaml ports are correct then and collector is able to connect) , but actually this manually work around will not happen in standard workflow. It shall behave on the first initial start as it does on the second |
Hey Fabian, so looking into this I think there's a couple things going on.
Let me know if this helps or if there's anything else in your post that I missed! Edit: fix link to issue in bullet point 1. |
Tx for looking into it. I will test the 0.113.0 soon and check if the port cfg works for me as well |
Tested and working |
Component(s)
cmd/opampsupervisor
What happened?
I have build collector and supervisor based on tag 0.112.0 and observed following issues
1.) Looking at the code there might be no toleration of the fact that the file "last_recv_remote_config.dat" is not existing always.
As this might happen on a fresh initial start where no config ever has been downloaded and cached. Is there a test covering such scenario?
opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go
Line 762 in 740d9aa
2.) The supervisor shall be fine with 1.) and retry to connect to opamp backend until it is reachable and not write "..not starting agent." : The agent (supervisor to opamp backend client) shall retry to connect continuously and never get in the situation to end up in a failed state where agent is no started. 1.) is no startup fail state - it is a bug i would say. Also in any real "startup fail state" (e.g the supervisor.yaml cfg file can't be read) the supervisor shall terminate itself by exit and not just write error logs.
3.) But maybe i am misleading the info message "No config present, not starting agent." as actually still the collector is started but with the wrong opamp port cfg - one that is different than the one what i have configured in the supervisor.yaml
supervisor.yaml
After the initial start of the supervisor (see 1.) the effective.yaml generated looks like below
As you can see the opamp sever ports does NOT use the configured "opamp_server_port: 12548" but still uses
random ports 36455
Collector version
0.112.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
related
#36001
ArthurSens@65992b7
@djaglowski fyi
The text was updated successfully, but these errors were encountered: