-
Hi I have a non fonctionnal node (node 184) Now it's stuck on 'protocol info' mode . How can I force it to DEAD status in order to kill him et delete it lot of trouble on my network because of this dead node. On the driver log , you can see so many 'Failed to execute controller command after 2/3 attempts. Scheduling next try i what to do !!!! |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 16 replies
-
Hm it seems like the controller goes unresponsive and never really recovers. @robertsLando I think we talked about adding an option to disable all controller recovery mechanisms. If this exists, maybe enabling that option and waiting a longer time (10-15 minutes) before attempting to remove the node could help. |
Beta Was this translation helpful? Give feedback.
-
Ok, I've done you test it's doesn't work the summury I can see the interview of the node 184 is trying but failed "2024-01-10T08:51:28.809Z CNTRLR [Node 184] Interview attempt 2/5 failed, retrying in 10000 ms.." still waiting, just waiting, doing nothing (but my Home probably doing few commands) 2024-01-10T08:54:10.447Z CNTRLR [Node 184] Failed all interview attempts, giving up. 8h56 : click on the button 'Failed node' on the node 184 in order to kill him as asked The full log The node 184 is still status 'Unknown' |
Beta Was this translation helpful? Give feedback.
-
New click on button 'failed node' on 184 2024-01-10T09:00:40.542Z CNTRLR « [Node 175] received wakeup notification full log here : |
Beta Was this translation helpful? Give feedback.
-
Well I've started reading the code. So , with the log I can say The function removeFailedNode is called, then removeFailedNodeInternal removeFailedNodeInternal first make a Find which must be FALSE , and my Ping is false Then the message to remove the node is sent to the driver. But the Driver can't manage to send the message, but in the log I just have the generic error , so 3 times it's try , and 3 times the driver don't manage to execute the message and no more information. In order to know why the driver don't manage, I should have more log |
Beta Was this translation helpful? Give feedback.
-
The log you posted is not on loglevel "debug", so it does not contain the reason for the command failures. |
Beta Was this translation helpful? Give feedback.
-
Ok, you're right, this is the good debug Log 2024-01-12T06:59:41.994Z CNTRLR [Node 184] ping failed: Timeout while waiting for an ACK from the controller ( Full log |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Ok new try 2024-01-13T08:25:34.188Z CNTRLR [Node 184] ping failed: Timeout while waiting for an ACK from the controller ( same trouble retstarting with TimeOut 60000... 1rst try : nok last try.... => OK !!!!!! Yataaaaa So now I've a question : is it possible , only when the removeNodeFailed is launch to temporary put the timeout to the max ? I'm sure I'm note the only whith this trouble ! |
Beta Was this translation helpful? Give feedback.
Here's where it starts: