Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail Secure mode #154

Merged
merged 18 commits into from
Oct 9, 2023
Merged

Fail Secure mode #154

merged 18 commits into from
Oct 9, 2023

Conversation

troglobit
Copy link
Contributor

This PR implements the Fail Secure mode. Best described by this image:

fail-secure

Also included are a few measures aiming to lock down Infix, as well as fixes to some minor buglets:

  • Fix duplicate ssh-hostkeys and improve error handling when generating SSH keys
  • Boot order fixes by upgrading to Finit v4.5-rc5
  • Rename condition provided by sysklogd: pid/sysklogd -> pid/syslogd since the two daemons providing the service are mutually exclusive
  • Guard critical services behind the <pid/syslogd> condition to guarantee boot order
  • Override Finit automatically generated dbus.conf to ensure it always runs after udevd and any udevadm call
  • Relocate all files related to bootstrap to src/confd/. The gathered files make sense to maintain together rather than scattered across the repo
  • Refactor factory-config generation to reuse scripts to generate failure-config
  • Add error case (see image above) which can be overridden by any br2-external using Infix

Failing bootstrap of YANG models, and creation of either factory-config or failure-config, now leads to the error case (VPD validation failure to error case to be added later):

fail-secure-bootstrap

Failing to load either startup-config or failure-config now leads to the error case:

fail-secure-load

This change addresses a problem accessing Infix over SSH.  The root cause
turned out to be the hostkeys, which live in /var/lib/ssh and not in /etc
on Infix, were corrupt.

The corruption was interesting in that they all existed, but had size 0.
This state was not caught by our ssh-genhostkeys script and that is what
this change attempts to fix.

As before this change, the script starts by calling `sshd -t` to verify
they hostkeys.  Unlike before we now check for 'invalid format' in the
output of that command.  If any file with invalid format is found, we
remove them and regenerate the hostkeys.

In this investigation it was found that the 'ssh-keygen -A' command that
generates hostkeys does not use the directories specified for the given
files in sshd_config, instead it always saves the files to /etc/ssh.

Also, since there is no panic in getting the hostkeys generated we can
allow the script to wait for syslogd to start before we run, even though
it all happens in runlevel S.

Signed-off-by: Joachim Wiberg <[email protected]>
Changes to the system concerning access rights, like users, should always
be logged, in particular when creating new users or failure to create or
modify their properties.

Signed-off-by: Joachim Wiberg <[email protected]>
Only create a default startup-config from factory-config if one is missing.
Do not load an existing startup-config, that is handled by a later step in
the boot process, which needs to be monitored (and displayed) separately to
fail over to a Fail Secure mode on error.

Signed-off-by: Joachim Wiberg <[email protected]>
The new confd-load.sh script handles bootstrapping Infix using startup-config
or a failure-config (see later commits) on error.

When used with the -b (bootstrap) option, and failure to load the give file,
the script sets a Finit condition and goes to runlevel 9.  The condition can
be used to trigger loading of a failure-config to go to a Fail Secure mode.

Signed-off-by: Joachim Wiberg <[email protected]>
Instead, use the version from package/skeleton-init-finit, including the
ssh-hostkeys (previously ssh-genhostkeys) from the same package, because
it include extensive error handling and logging on failure.

Signed-off-by: Joachim Wiberg <[email protected]>
 - with conditional execution support
 - fixes to udevd and udevadm calls in renamed 10-hotplug.conf
 - support for overriding internal services, e.g. dbus, keventd

Signed-off-by: Joachim Wiberg <[email protected]>
This commit drops sulogin from BusyBox, as well as the Finit replacement,
in the NETCONF builds.  The classic builds retain the Finit sulogin.

Furthermore, the Finit rescue mode is disabled (which uses sulogin), so
in case of trouble at boot, e.g. missing fstab or failure to fsck, the
system will no longer go to sulogin but instead log error to console and
reboot.

  NOTE: the bootloader still needs to be locked down, otherwise a user
        could just as easily change kernel cmdline to 'shell=/bin/sh'

Misc. reshuffle and defaults updated are due to make foo-update-defconfig.

Signed-off-by: Joachim Wiberg <[email protected]>
This commit fixes a regression introduced in 15572e9 where /bin/bash
was unintentionally set as the defalt /bin/sh in the system. This cause
several warnings and errors when a Bash-based /bin/sh tries to sources
/etc/profile because $SHELL identifies it as /bin/bash.

The ietf-system.yang model, with the infix-system.yang extensions,
declare a per-user SHELL that allow /bin/clish, /bin/bash, /bin/sh
and /bin/false.  There should be a clear distinction between them.

This change also helps us keep bashisms away.  If a script needs
Bash features, declare: #!/usr/bin/env bash

Signed-off-by: Joachim Wiberg <[email protected]>
The sysklogd and BusyBox syslogd can not run at the same time.  Since they
provide the same service we standardize on them providing <pid/syslogd>.

Also, ensure syslogd does not start until the fifth udevadm has completed.
This creates a barrier preventing other run/task/services from starting
too early.  Thus guaranteeing a proper boot order.

Signed-off-by: Joachim Wiberg <[email protected]>
This ensures they do not start earlier than the system log daemon.

Signed-off-by: Joachim Wiberg <[email protected]>
This ensures dbus is not started before any udevadm call has completed.

Signed-off-by: Joachim Wiberg <[email protected]>
The template and scripts for generating per-device factory-config have
been spread out across the repo.  This is an attempt to gather all the
pieces to a single location for better overview.

Parts of factory-config will be reused for the new fail secure mode, in
the file failure-config.  The beginnings of which are in this commit.

Other changes:
 - cfg-bootstrap and confd-bootstrap have been collapsed into one
 - let gen-hostname + gen-interfaces save to /cfg instead of /etc,
   we've moved the /etc directory to read-only storage in /usr/share
 - delay start of bootstrap and sysrepo-plugind after syslogd barrier
 - set 'norestart' when loading startup-confg and failure-config,
   no point in retrying if that fails, just go to error immediately

Signed-off-by: Joachim Wiberg <[email protected]>
This rather huge change is a refactor of the factory-config generataion to add
support for also generating a failure-config.

The confd bootstrap script has been given an rc file.  This both eases manual
testing, when modifying the script(s), and also makes it easier to override
from a br2-external.  Infix default is router/end-device, but a br2-external
may be a switch firmware and want to default to all ports in a bridge.

The generated failure-config creates a fail-safe "do no harm" config to boot
with in case startup-config for some reason is broken or cannot be applied,
e.g., bug in confd.  Meaning, for both the router and switch use-cases the
device will start up with all interfaces isolated¹, with an IPv6 SLAAC (EUI64)
address per interface.

Services enabled in this fail-safe mode are: LLDP, mDNS/SD, SSH, and NETCONF.
All to facilitate diagnostics, troubleshooting and device recovery.

Other noteworthy changes include:

 - rename factory/failure directories again -> factory.d/failure.d.  Use
   same naming as we do on target for directories holding generated files
 - The bootstrap script no longer regenerates /cfg/factory.d on each boot
 - The bootstrap script copies all static templates to /cfg/factory.d in
   case a newer image changes the contents of them.  For troubleshooting
 - Support for overriding the 20-interfaces.json generation by br2-external
 - Support for additional 30-config.json (ovrride/extend) by br2-external
 - Expand gen-interfaces to support bridge use-case.
_____
¹ For a switch this means "no switchport", i.e., no switching between ports
  otherwise connected to a switchcore (or bridge) in startup-config.

Signed-off-by: Joachim Wiberg <[email protected]>
Signed-off-by: Joachim Wiberg <[email protected]>
 - New script 'error' that can be overridden by a br2-external
 - Call 'error' if YANG model bootstap or factory-config.gen fails
 - Call 'error' if loading startup-config or failure-config fails

The 'error' script calls syslog¹ to log the error messge:

    The device has reached an unrecoverable error, please RMA.
____
¹ to get a timestamp log message, and send remote in case a
  br2-external has somehow hard-coded network for remote syslog

Signed-off-by: Joachim Wiberg <[email protected]>
Everything else is named confd, even syslog messages are logged as
confd, and we've talked about not using sysrepo-plugind one day, so
let's prepare for a world where everthing is like that.

Also, and even more importantly, fix the sysrepo-plugind condition.
We cannot use <pid/syslogd>, because it can be restarted and thus
consequently Finit will stop sysrepo-plugind.  This first caused a
bit of head scratching because it cause a lot of very odd errors in
the execution of sysrepo-plugind, transactions being abruptly aborted
for instace.

Whatt we can do, however, is use a static condition, which Finit has
support for since a few releases now.  We want to guard the start of
our sysrepo-plugind service behind YANG /usr/libexec/confd/bootstrap.
So we can use <run/bootstrap/success>.  In case the bootstrap fails,
we catch that in the row before using if:<run/bootstrap/failure> to
trigger the /usr/libexec/confd/error script (described previously).

The other run/task/services can be guarded behind <pid/confd> and now
everything suddenly makes sense.

Signed-off-by: Joachim Wiberg <[email protected]>
@troglobit troglobit added the enhancement New feature or request label Oct 6, 2023
@troglobit troglobit added this to the Infix v23.10 milestone Oct 6, 2023
@troglobit troglobit requested a review from wkz October 6, 2023 18:35
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! 🚀

I think we should merge this and then improve the interface generation in a followup.

src/confd/bin/gen-interfaces Show resolved Hide resolved
src/confd/bin/gen-interfaces Show resolved Hide resolved
@wkz wkz merged commit db1d7f4 into main Oct 9, 2023
@wkz wkz deleted the fail-secure-mode branch October 9, 2023 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants