-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service and Endpoints for the node exporters are not correctly configured #826
Comments
@SSvilen thanks for the provided information. As you can see in the manifests/windows-exporter_v1_service.yaml, the type is not set to ClusterIP. |
The monitoring for the namespace is enabled. The problem is that the node exporter is installed on the windows worker nodes and it's not running as a pod, like it is for the linux based os. So prometheus operator can not properly discover the endpoint for that servicemonitor. |
@SSvilen Thanks for confirming that. |
OK, I see what's happening.
and based on the code in metrics.go an internal IP address is expected. The status field of the machine shows type 'InternalDNS' status:
addresses:
- address: winmach-q84jj
type: InternalDNS I'm not sure why that is. |
@SSvilen can you provide details about the WMCO version, cloud provider, OCP version and the Windows Server version used for the VM? |
WMCO 3.1 But it would be also beneficial, if there is a bit more logging. For instance here. That would make the troubleshooting easier. |
@SSvilen logging info noted.
|
Yes.
machinesSet network.txt Thanks! |
@SSvilen Thanks for providing the logs, from the operator logs I can see the IP address cannot be found to configure the Windows machine into a node. You should see the same issue if you try to |
ok thanks. We'll look at it again. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
I'm seeing the same issue on vsphere, did you ever figure anything out? |
@MattPOIson Can you provide including details about your setup from this comment so I can help you further. |
You need a working reverse DNS - during the addition of the windows worker node, the operator creates the endpoints. |
The cluster is running in vsphere and we are using machinesets to provision the servers. If change the service to be of type 'ExternalName' and create an endpoint that includes the node it works fine, its just not happening automatically like it should. |
Reverse DNS lookup works fine in our network, the internal IP still isn't being populated on the machine so the endpoint isn't being created. ping -a 10.33.. Pinging k8s-se-****************** [10.33..] with 32 bytes of data: |
why do the logs from the operator say when you add a new machine? |
Its throwing this error. I'm trying to figure out where/how in the code the operator gets the external IP address. They are provisioned as machine sets.
|
@MattPOlson can you add the full WMCO log snippet? Those are just the initial debug logs, which should resolve themseleves once the IP for the machine is available. |
Any updates on this, I feel like this is either a legit issue or something isn't documented correctly as far as the setup goes. I looked through the code but I can't figure out why the internal IP still isn't being populated on the machine so the endpoint isn't being created. |
@MattPOlson can I ask what OCP and WMCO version you are using? |
@saifshaikh48 sure: |
Interesting, that version should have the proper permissions. |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is still as issue in version 5.1.1. I have to update the endpoint manually to get any metrics back from the windows nodes. /reopen |
@MattPOlson: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'll look into this today /reopen |
@sebsoto: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Seeing
and
In the logs but the ns has the correct |
WMCO checks for metrics being enabled on the namespace its deployed only in at startup. Thinking about two potential options to fix this
|
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
This can be solved through https://issues.redhat.com/browse/WINC-545 |
/remove-lifecycle stale |
/lifecycle frozen |
The windows node exporter is installed on all windows worker nodes, but the required Service and Endpoint resources are no created at all.
There is a service object created, but it's from type ClusterIP, which in this case won't work.
The Service should be of type 'ExternalName' and the Endpoints should be updated by the operator on every node join/deletion operation.
For instance:
The text was updated successfully, but these errors were encountered: