Do not hard fail on a failed config parse in a manner which prevents the user from fixing it #342

C0rn3j · 2024-01-26T12:29:11Z

I apologize for the quality of this issue, I had a great one 95% completed, until I had the bright idea to restart dbus-broker on my main system instead of the VM I was testing on, and everything was lost, including my will to rewrite everything properly

This issue exists to prevent #337 from happening without ending up in a usable terminal to be able to fix things.

To repro the state a simple touch /usr/share/dbus-1/system.d/broken.conf and make sure DM is enabled before rebooting.

Current state is that DBUS fails on loading a broken config, which makes systemd-logind fail, which makes it fail to allocate getty on VT2-VT5 like usual.
When a user has a DM enabled, which is the usual state, it will additionally eat the only remaining VT1, preventing getty from starting here, at which the point the only way forward is to reboot, hoping that the bootloader is editable so one can do the booting into /bin/bash trick, or boot a different operating system and fix it from there.

Solutions to this, of which it likely makes sense to implement multiple, if not all:

Make DM launch depend on DBUS launching correctly, so it doesn't eat the only available terminal, this may or may not be reasonable/possible - this is a distro packaging issue at that point I suppose
On fail, find out which VT is free and launch a rescue getty there(equivalent of systemctl start [email protected]), save which one that was so DBUS doesn't launch 5 of them on retries
You can see systemd-logind launch error written out in the image from the OP post of the issue that I linked above. I don't see why DBUS couldn't log its own issue on tty0 too, along with the advice to switch to the rescue VT from 2) to check the full log.
Implement a config check tool and advise distros to use this in packaging, so user is informed of a future failure BEFORE their system is rendered useless

The text was updated successfully, but these errors were encountered:

rgudwin · 2024-01-26T19:46:31Z

I still believe that it is much simpler to, instead of failing, just print a warning, freeze for 30 seconds, and then ignore the failing configure file and run the service without failure. Just paralyzing the system for 30 seconds during boot, will be enough for someone to try to discover what is wrong and fix the failing file. There is no necessity to difficult life and freeze the whole system due to not allowing a TTY. I understand that a simple warning will probably be unnoticed, but if you print a warning and lock the system for 30 seconds, this would be an enough annoyance for someone to notice that there is something wrong and start looking to fix the non-compliant XML files.

C0rn3j · 2024-01-26T20:08:47Z

Just paralyzing the system for 30 seconds during boot, will be enough for someone to try to discover what is wrong and fix the failing file.

If a server I am rebooting takes 30 seconds more or less is absolutely transparent to me - the boot time is in minutes, and I don't even care about it in the first place.
Even my desktop takes two minutes due to the amount of storage and inherent DRR5 slowdowns made worse by poor firmware.
Also you are greatly overestimating people's unwillingness to deal with a 30s slowdown over spending real time on fixing the issue.

I understand why it's failing, but it is rendering a regular desktop system irrecoverable, which is not good.

C0rn3j changed the title ~~Do not hard fail on DBUS failure~~ Do not hard fail on a failed config parse Jan 26, 2024

C0rn3j mentioned this issue Jan 29, 2024

The default position of critical infrastructure cannot be complete system failure #341

Open

C0rn3j changed the title ~~Do not hard fail on a failed config parse~~ Do not hard fail on a failed config parse in a manner which prevents the user from fixing it Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not hard fail on a failed config parse in a manner which prevents the user from fixing it #342

Do not hard fail on a failed config parse in a manner which prevents the user from fixing it #342

C0rn3j commented Jan 26, 2024 •

edited

Loading

rgudwin commented Jan 26, 2024

C0rn3j commented Jan 26, 2024 •

edited

Loading

Do not hard fail on a failed config parse in a manner which prevents the user from fixing it #342

Do not hard fail on a failed config parse in a manner which prevents the user from fixing it #342

Comments

C0rn3j commented Jan 26, 2024 • edited Loading

rgudwin commented Jan 26, 2024

C0rn3j commented Jan 26, 2024 • edited Loading

C0rn3j commented Jan 26, 2024 •

edited

Loading

C0rn3j commented Jan 26, 2024 •

edited

Loading