Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Service key "expose" functionality is flaky on big changes #109

Open
rvolosatovs opened this issue Aug 9, 2022 · 2 comments
Open

Service key "expose" functionality is flaky on big changes #109

rvolosatovs opened this issue Aug 9, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@rvolosatovs
Copy link
Member

rvolosatovs commented Aug 9, 2022

Sometimes, expose-key script (

exposeKey = pkgs: user: key:
pkgs.writeShellScript "expose-${key}.sh" ''
chmod 0400 "${key}"
chown ${user}:${user} "${key}"
'';
) run at ExecStartPre
serviceConfig.ExecStartPre = "+${expose}";

Fails with (example from sgx.equinix.try.enarx.dev):

Aug 09 11:23:36 snp rx9j1qmp42p8y4n50rbqdlsvahxwbyq3-expose--run-secrets-oidc-secret.sh[91406]: chown: invalid user: ‘benefice:benefice’

benefice:benefice should be created by systemd due to DynamicUser=true, but in some cases it is apparently not created. It could also perhaps be a race condition?
Not sure.

One of the options to work around this could be relying on https://search.nixos.org/options?channel=22.05&show=systemd.services.%3Cname%3E.script&from=0&size=50&sort=relevance&type=packages&query=systemd.services.*.script/ in our module definitions to attempt to do this step in ExecStart (the way this option works, is that add definitions of it are merged into one script and that resulting script is then set as the ExecStart). I believe that if the user/group does not exist yet at that point, then that should be Systemd bug.

To work around this issue, we currently have to temporarily disable this functionality and run services as root. On a repeated deployment this issue does not occur.

We could also, of course, poll in ExecStartPre until a user and group are created, but that's ugly and error-prone

@puiterwijk any ideas?

@rvolosatovs
Copy link
Member Author

rvolosatovs commented Aug 9, 2022

1e47b47 - build: disable `DynamicUser` and temporary key access

Signed-off-by: Roman Volosatovs <[email protected]>
(G) Roman Volosatovs <[email protected]> (Tue Aug 9 14:06:17 2022 +0200)


diff --git a/nixosConfigurations/services/benefice.nix b/nixosConfigurations/services/benefice.nix
index 8a90bc7..9098a69 100644
--- a/nixosConfigurations/services/benefice.nix
+++ b/nixosConfigurations/services/benefice.nix
@@ -31,7 +31,8 @@ with flake-utils.lib.system; let
       sops.secrets.oidc-secret.restartUnits = ["benefice.service"];
       sops.secrets.oidc-secret.sopsFile = "${self}/hosts/${config.networking.fqdn}/oidc-secret";
 
-      systemd.services.benefice = self.lib.systemd.withSecret config pkgs "benefice" "oidc-secret";
+      #systemd.services.benefice = self.lib.systemd.withSecret config pkgs "benefice" "oidc-secret";
+      systemd.services.benefice.serviceConfig.DynamicUser = pkgs.lib.mkForce false;
     })
   ];
 
diff --git a/nixosConfigurations/services/steward.nix b/nixosConfigurations/services/steward.nix
index 7262a33..1ae780c 100644
--- a/nixosConfigurations/services/steward.nix
+++ b/nixosConfigurations/services/steward.nix
@@ -22,7 +22,8 @@ with flake-utils.lib.system; let
       sops.secrets.key.restartUnits = ["steward.service"];
       sops.secrets.key.sopsFile = "${self}/hosts/${config.networking.fqdn}/steward.key";
 
-      systemd.services.steward = self.lib.systemd.withSecret config pkgs "steward" "key";
+      #systemd.services.steward = self.lib.systemd.withSecret config pkgs "steward" "key";
+      systemd.services.steward.serviceConfig.DynamicUser = pkgs.lib.mkForce false;
     })
   ];

This is the patch to disable DynamicUser feature and the script. When issue occurs, I apply this, deploy, revert and redeploy.

@rvolosatovs rvolosatovs changed the title Service key "expose" functionality is flaky Service key "expose" functionality is flaky on big changes Aug 9, 2022
@rvolosatovs rvolosatovs added the bug Something isn't working label Aug 9, 2022
rvolosatovs added a commit that referenced this issue Aug 9, 2022
Ideally, this should not be necessary, but it's required due to #109

Signed-off-by: Roman Volosatovs <[email protected]>
rvolosatovs added a commit that referenced this issue Aug 9, 2022
Ideally, this should not be necessary, but it's required due to #109

Signed-off-by: Roman Volosatovs <[email protected]>
@rvolosatovs
Copy link
Member Author

A workaround was applied in 31930cb, where we ensure that benefice and steward (only services experiencing this issue, since they use SOPS) user/groups exist

@dpal dpal moved this to New in Profian Board Nov 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
Status: New
Development

No branches or pull requests

1 participant