You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 8, 2024. It is now read-only.
The idea is to have a program running in another thread (via node:child_process or node:worker_threads) that will monitor the main process.
The main process will send data to the watchdog to signal its state (something like alive every second).
If the server does not send any data for a long amount of time (probably a minute) the watchdog will initialize the data loss mitigation protocol.
Data Loss Mitigation Protocol
The watchdog could simply kill the main process but that would destroy progress and rollback the server.
My idea is for the watchdog to have a node:inspector instance inspecting the main process.
Upon noticing the catastrophic lag the watchdog is going to send a signal to the server telling it to disconnect all clients, save the world, and shut down (to mitigate data loss). This will, however, not happen as the server is currently stuck in a loop.
I have many ideas on how to get out of the loop programmatically, some dead simple and some overly complex, I will document some of them here:
break (in intervals of 1 second probably), this should hopefully break out of while (true) {} loops but might not be able to get out of more complicated code
Have a chance of not-ing an if, while or for which would start at 0% and very slowly go up to avoid catastrophic data corruption and fatal errors
Have a "stuck" counter, count each time lines are visited, if some line has more than N (probably 1024 or some large number like that) visits, do one (or multiple) of the methods stated above. Reset the counter when a new line gets visited. This might be useful to avoid false-positives.
A problem with all of these anti-loop solutions is that they create invalid states (ex. a function was supposed to return a string but because it prematurely exited the loop it returned undefined), this might cause data corruption.
To mitigate this the last world should be backed up and upon server start the latest world should be attempted to load, if corrupted, load the backup.
A better solution might be to have a loose data parser which when encountering unexpected results would try its best to not crash.
The text was updated successfully, but these errors were encountered:
Dinhero21
changed the title
anti-while-loop-watchdog
anti-catastrophic-lag-watchdog
Oct 28, 2023
The idea is to have a program running in another thread (via
node:child_process
ornode:worker_threads
) that will monitor the main process.The main process will send data to the watchdog to signal its state (something like
alive
every second).If the server does not send any data for a long amount of time (probably a minute) the watchdog will initialize the data loss mitigation protocol.
Data Loss Mitigation Protocol
The watchdog could simply kill the main process but that would destroy progress and rollback the server.
My idea is for the watchdog to have a
node:inspector
instance inspecting the main process.Upon noticing the catastrophic lag the watchdog is going to send a signal to the server telling it to disconnect all clients, save the world, and shut down (to mitigate data loss). This will, however, not happen as the server is currently stuck in a loop.
I have many ideas on how to get out of the loop programmatically, some dead simple and some overly complex, I will document some of them here:
break
(in intervals of 1 second probably), this should hopefully break out ofwhile (true) {}
loops but might not be able to get out of more complicated codeif
,while
orfor
which would start at 0% and very slowly go up to avoid catastrophic data corruption and fatal errorsA problem with all of these anti-loop solutions is that they create invalid states (ex. a function was supposed to return a string but because it prematurely exited the loop it returned undefined), this might cause data corruption.
To mitigate this the last world should be backed up and upon server start the latest world should be attempted to load, if corrupted, load the backup.
A better solution might be to have a loose data parser which when encountering unexpected results would try its best to not crash.
The text was updated successfully, but these errors were encountered: