anti-catastrophic-lag-watchdog #12

Dinhero21 · 2023-10-28T03:07:51Z

The idea is to have a program running in another thread (via node:child_process or node:worker_threads) that will monitor the main process.

The main process will send data to the watchdog to signal its state (something like alive every second).

If the server does not send any data for a long amount of time (probably a minute) the watchdog will initialize the data loss mitigation protocol.

Data Loss Mitigation Protocol

The watchdog could simply kill the main process but that would destroy progress and rollback the server.

My idea is for the watchdog to have a node:inspector instance inspecting the main process.

Upon noticing the catastrophic lag the watchdog is going to send a signal to the server telling it to disconnect all clients, save the world, and shut down (to mitigate data loss). This will, however, not happen as the server is currently stuck in a loop.

I have many ideas on how to get out of the loop programmatically, some dead simple and some overly complex, I will document some of them here:

break (in intervals of 1 second probably), this should hopefully break out of while (true) {} loops but might not be able to get out of more complicated code
Have a chance of not-ing an if, while or for which would start at 0% and very slowly go up to avoid catastrophic data corruption and fatal errors
Have a "stuck" counter, count each time lines are visited, if some line has more than N (probably 1024 or some large number like that) visits, do one (or multiple) of the methods stated above. Reset the counter when a new line gets visited. This might be useful to avoid false-positives.

A problem with all of these anti-loop solutions is that they create invalid states (ex. a function was supposed to return a string but because it prematurely exited the loop it returned undefined), this might cause data corruption.

To mitigate this the last world should be backed up and upon server start the latest world should be attempted to load, if corrupted, load the backup.

A better solution might be to have a loose data parser which when encountering unexpected results would try its best to not crash.

The text was updated successfully, but these errors were encountered:

Dinhero21 · 2023-10-29T01:45:44Z

Runtime.terminateExecution seems like a viable way of doing idea 1

Dinhero21 · 2023-10-29T20:58:18Z

After a lot of searching, I finally found this which allows you to debug node remotely.

Dinhero21 changed the title ~~anti-while-loop-watchdog~~ anti-catastrophic-lag-watchdog Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anti-catastrophic-lag-watchdog #12

anti-catastrophic-lag-watchdog #12

Dinhero21 commented Oct 28, 2023

Dinhero21 commented Oct 29, 2023

Dinhero21 commented Oct 29, 2023

anti-catastrophic-lag-watchdog #12

anti-catastrophic-lag-watchdog #12

Comments

Dinhero21 commented Oct 28, 2023

Data Loss Mitigation Protocol

Dinhero21 commented Oct 29, 2023

Dinhero21 commented Oct 29, 2023