Add Bubblewrap implementation for sandboxing #153

RobertRosca · 2023-12-10T07:27:14Z

Adds sandboxing via bubblewrap for context file execution, closes #150.

Main functionality is implemented via a Bubblewrap class which is a convenience class for building up the CLI arguments/flags/mounts required to sandbox a python process to only have access to the data from a single proposal directory.

There's also a new flag --no-sandbox to disable the sandboxing feature. This can be set at launch time for the listener and will be propagated to other subprocess commands down to the final relevant extract_in_subprocess call.

Summary

Bubblewrap implements:

__init__: initializes with some default flags and bind mounts for running the sandboxed process, these defaults are all security/isolation related (unshare/disable namespaces) or required binds/settings (e.g. share network, bind /bin, /lib, etc...).
add_bind: main method for adding some bind mount with a source, destination, and a flag to set the mount to read only or not.
add_bind_proposal which takes in a proposal number, finds the directory, bind mounts the directory, and resolves the top-level symlinks in the proposal directory, then bind mounts those in as well.
add_bind_venv which takes the path to a python executable and, if it is running in a venv, bind mounts all paths required by it to the sandbox.
build_command takes in the command to sandbox and prepends the full bubblewrap command to it, then returns a list that can be called by subprocess to start the sandboxed command.

There are some tests to check that the command is built (at least somewhat) correctly, but they're not that robust since I'm not sure how reliable testing with something bubblewrap is when running in a CI environment.

Example

An example of building up the commands would be:

b = Bubblewrap()

b.add_bind_proposal(3422)

b.add_bind_venv("/gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/bin/python")

b.build_command("/bin/bash")

Which creates:

bwrap \
  --unshare-all \
  --share-net \
  --dev /dev \
  --tmpfs /tmp \
  --dir /gpfs \
  --ro-bind /bin /bin \
  --ro-bind /etc/resolv.conf /etc/resolv.conf \
  --ro-bind /gpfs/exfel/sw/software /gpfs/exfel/sw/software \
  --ro-bind /lib /lib \
  --ro-bind /lib64 /lib64 \
  --ro-bind /sbin /sbin \
  --ro-bind /usr /usr \
  --bind /gpfs/exfel/exp/MID/202304/p003422 /gpfs/exfel/exp/MID/202304/p003422 \
  --bind /gpfs/exfel/u/scratch/MID/202304/p003422 /gpfs/exfel/u/scratch/MID/202304/p003422 \
  --bind /pnfs/xfel.eu/exfel/archive/XFEL/raw/MID/202304/p003422 /pnfs/xfel.eu/exfel/archive/XFEL/raw/MID/202304/p003422 \
  --bind /gpfs/exfel/u/usr/MID/202304/p003422 /gpfs/exfel/u/usr/MID/202304/p003422 \
  --bind /gpfs/exfel/d/proc/MID/202304/p003422 /gpfs/exfel/d/proc/MID/202304/p003422 \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9 /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9 \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9 /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9 \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9/site-packages /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9/site-packages \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9/site-packages /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/lib/python3.9/site-packages \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/include/python3.9 /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/include/python3.9 \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/include/python3.9 /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/include/python3.9 \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/bin /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env/bin \
  --ro-bind /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env /gpfs/exfel/u/usr/MID/202304/p003422/Software/analysis_env \
  /bin/bash

Questions

Main one is... where should this go? - Went with making it part of `extract_in_subprocess`

It can be added anywhere there is a subprocess command, or even in the ctxrunner:

DAMNIT/damnit/backend/listener.py

Line 150 in 377af28

extract_proc = subprocess.Popen([
DAMNIT/damnit/backend/extract_data.py

Line 56 in 377af28

return subprocess.run(args, env=env, **kwargs)

seems like a good option since it would also cover the case of slurm jobs, as far as I understand from just glancing at

DAMNIT/damnit/backend/extract_data.py

Line 283 in 377af28

python_cmd = [sys.executable, '-m', 'damnit.backend.extract_data',
DAMNIT/damnit/ctxsupport/ctxrunner.py

Line 410 in 377af28

def main(argv=None):

by creating a new flag --sandbox which, if present, re-executes itself within the sandbox.

Other questions are:

Right now I'm only setting read-only binds for the python venv (if there is one). Alternative would be to mount everything as read only by default, and explicitly pass a single output directory which is writable. - spoke to James, decided to leave proposal dir as normal mount instead of having everything read only and one output directory due to other tools potentially writing to other directories.
--die-with-parent is set to kill the sandbox if the parent (DAMNIT) process dies, I assume that's desirable? - spoke to James, seems fine.
--unshare-all and some of the binds break (on purpose) authentication, slurm, ssh, etc..., I assume people don't expect to be able to have some function in a context file that runs a subprocess with ssh or something? - spoke to James, mentioned that xwiz will run slurm commands on its own which would break, added in option to disable sandboxing when the listener is spawned.

TODO:

Test read write to db and context file
Test supervisor start with no sandbox

Required as temporary directory is used for storing some data

RobertRosca · 2023-12-12T09:31:40Z

Seeemmmss to work, at least to some degree. Tested it with the following context file:

import subprocess
import socket
from datetime import timedelta
from pathlib import Path

import numpy as np

from damnit_ctx import Variable

@Variable(title="Trains")
def n_trains(run):
    return len(run.train_ids)

@Variable(title="Proposals")
def list_proposals(run):
    return str(list(Path("/gpfs/exfel/exp/").glob("*")))

@Variable(title="Slurm")
def srun(run):
    return subprocess.run(
        ["sacct"],
        text=True,
        stdout = subprocess.PIPE,
        stderr = subprocess.STDOUT,
    ).stdout

@Variable(title="Web")
def ping(run):
    return subprocess.run(
        ["curl", "example.com"],
        text=True,
        stdout = subprocess.PIPE,
        stderr = subprocess.STDOUT,
    ).stdout

@Variable(title="SSH")
def ping(run):
    return subprocess.run(
        ["ssh", "max-exfl", "hostname"],
        text=True,
        stdout = subprocess.PIPE,
        stderr = subprocess.STDOUT,
    ).stdout

@Variable(title="Slurm Executed", cluster=True)
def ping(run):
    host = socket.getfqdn()

    proposals = len(list(Path("/gpfs/exfel/exp/").glob("*/*/*")))

    return f"{host} - {proposals}"

Which updates the sqlite database and creates the extracted data files successfully. The output is:

Full output

4507, 
1, 
1697763475.124875, 
1702324474.924, 
None, 
"[PosixPath('/gpfs/exfel/exp/FXE/202302/p004507')]", 
12373, 
'sacct: error: resolve_ctls_from_dns_srv: res_nsearch error: Unknown host\nsacct: error: fetch_config: DNS SRV lookup failed\nsacct: error: _establish_config_source: failed to fetch config\nsacct: fatal: Could not establish a configuration source\n', 
'max-exfl093.desy.de - 1', 
'No user exists for uid 33392\n', 
'  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\n  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n100  1256  100  1256    0     0   6197      0 --:--:-- --:--:-- --:--:--  6217\n<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'

u"[PosixPath('/gpfs/exfel/exp/FXE/202302/p004507')]" - only the relevant FXE data is mounted and available
12373 - number of trains in some run
u'sacct: error: resolve_ctls_from_dns_srv...' - error caused by sandboxing
'No user exists for uid...' - ssh broken due to lack of /etc mounts
'<title>Example D...' - normal internet connectivity is fine
'max-exfl093.desy.de - 1' - slurm job allocated on node and still only saw one proposal directory

JamesWrigley

One other thing, could you write some docs about this in backend.md? Including how to enable/disable it.

Also, your code is outstanding :) I love all the documentation and type hints ❤️

damnit/backend/extract_data.py

JamesWrigley · 2023-12-12T12:45:09Z

damnit/backend/listener.py

 self.extract_procs_queue.put((proposal, run, extract_proc))

-def listen():
+def listen(sandbox: bool):


Could we move the sandbox setting to the databases metameta table? That way we can set it relatively easily with amore-proto db-config and we wouldn't have to restart the listener after changing it. It would also make it simpler to move to a centralized listener in the future and have different sandbox settings for different proposals.

Hmm yeah, that was my first idea, but it depends on how 'secure' the configuration should be, since if it's in the database then users could enable/disable the sandboxing as easily as we could, unless the db is read only to the DAMNIT user, which would break a lot of things.

Keeping the setting as part of how the process is started means that the user running the listener could have sandboxing on, while others can still modify the DB/run reprocessing themselves.

I think options are:

Make it part of the table with tables as read-only, enforces sandboxing for processes, but breaks user write permissions to the table.

Make it part of the table, still read-write, risk of users disabling sandboxing.

Some extra configuration file in the proposal directory/for the listener which is read-only to the DAMNIT user. This kind of config may end up being required as part of the move to a central listener anyway.

Something set as part of the supervisor config?

Leave it as a flag.

With all options ending in "for now" 😛

I'd say lets go with option 2 for now, and later we can move it to the centralized listener settings (which should be inaccessible by users).

damnit/backend/sandboxing.py

JamesWrigley · 2023-12-12T12:55:23Z

tests/test_sandboxing.py

+
+
+@pytest.fixture
+def bubblewrap():


Nitpicking, could the fixtures go into conftest.py with the others?

RobertRosca

One other thing, could you write some docs about this in backend.md? Including how to enable/disable it.

Yep, I'll add some stuff there, including notes on when it should be disabled.

Also, your code is outstanding :) I love all the documentation and type hints ❤️

Aha thanks, my IDE blinds me with warnings if I don't 😂 😂 good motivator

RobertRosca · 2023-12-12T10:48:29Z

damnit/cli.py


 elif args.subcmd == 'reprocess':
 # Hide some logging from Kafka to make things more readable
 logging.getLogger('kafka').setLevel(logging.WARNING)

 from .backend.extract_data import reprocess
- reprocess(args.run, args.proposal, args.match, args.mock)
+ reprocess(args.run, args.proposal, args.match, args.mock, args.no_sandbox)


damnit/backend/extract_data.py

RobertRosca · 2023-12-12T15:40:54Z

damnit/backend/listener.py

 self.extract_procs_queue.put((proposal, run, extract_proc))

-def listen():
+def listen(sandbox: bool):


Hmm yeah, that was my first idea, but it depends on how 'secure' the configuration should be, since if it's in the database then users could enable/disable the sandboxing as easily as we could, unless the db is read only to the DAMNIT user, which would break a lot of things.

Keeping the setting as part of how the process is started means that the user running the listener could have sandboxing on, while others can still modify the DB/run reprocessing themselves.

I think options are:

Make it part of the table with tables as read-only, enforces sandboxing for processes, but breaks user write permissions to the table.

Make it part of the table, still read-write, risk of users disabling sandboxing.

Some extra configuration file in the proposal directory/for the listener which is read-only to the DAMNIT user. This kind of config may end up being required as part of the move to a central listener anyway.

Something set as part of the supervisor config?

Leave it as a flag.

With all options ending in "for now" 😛

damnit/backend/sandboxing.py

RobertRosca · 2023-12-12T15:43:45Z

tests/test_sandboxing.py

+
+
+@pytest.fixture
+def bubblewrap():


takluyver · 2023-12-21T15:51:24Z

damnit/backend/sandboxing.py

+ if venv == "False":
+ return


I think it would be surprising that this method is a complete no-op if the Python you point to isn't a venv. Especially if people use conda envs - they're technically not venvs, but I think people usually expect things to work in the same way.

Would it make sense to turn this into add_bind_python and try to do the right thing with the target Python, venv or not?

It could also start by adding the main env directory (i.e. the bit before bin/python) and only add extra paths if they're not under that; it should be the same, but a shorter command is easier to make sense of if we ever need to.

(Also, I think you'd actually end up with "False\n" here if the target is not a venv, so the check would fail. It might be worth using JSON to send details back, just to avoid fiddly details like this.)

takluyver · 2023-12-21T16:06:19Z

damnit/backend/extract_data.py

@@ -106,6 +108,24 @@ def extract_in_subprocess(
 for m in match:
 args.extend(['--match', m])

+ if sandbox:
+ bubblewrap = Bubblewrap()
+ with contextlib.suppress(Exception):


Suggested change

with contextlib.suppress(Exception):

with contextlib.suppress(FileNotFoundError):

Is that what we're trying to catch? Do we expect it to come up in real use, or only in testing?

RobertRosca added 2 commits December 10, 2023 07:41

Add Bubblewrap implementation for sandboxing

79e4852

Add tests for bubblewrap sandboxing

588e504

RobertRosca self-assigned this Dec 10, 2023

RobertRosca added 7 commits December 10, 2023 08:53

Remove redundant calls to Path

9ef250e

Only bind gpfs sw if it exists

22d80af

Split bind mount string in two

b47ccd0

Split bubblewrap args from command with explicit --

4922106

Integrate bubblewrap with extract_in_subprocess

b7cee2a

Do not sandbox during fixture setup

63867b9

Bind mount /tmp into bubblewrap

c47ec31

Required as temporary directory is used for storing some data

RobertRosca force-pushed the feat/sandboxing branch from 0d510b1 to 84719c2 Compare December 11, 2023 14:00

Install bubblewrap in test environment

261a5df

RobertRosca force-pushed the feat/sandboxing branch from 84719c2 to 261a5df Compare December 11, 2023 14:03

RobertRosca added 10 commits December 11, 2023 15:18

Add tests for read/write permissions/uid in sandbox

cc85af3

Add and propagate "--no-sandbox" flag

afbca2e

Explicitly bind out path in to bubblewrap

e614782

Test propagation of --no-sandbox

2bbaa01

Test bwrap call generated by extract_in_subprocess

c8b2731

Propagate sandbox flag to reprocess

1866b3a

Import from future for older python

6551b2c

Append no sandbox flag to command list

f978ad1

Mount in cwd for context/db, add note on file mounts

a1910a9

Add todo on mounting in the actual context file and db instead of cwd

2a8033b

RobertRosca marked this pull request as ready for review December 12, 2023 09:31

RobertRosca requested review from JamesWrigley, takluyver and tmichela December 12, 2023 09:32

JamesWrigley requested changes Dec 12, 2023

View reviewed changes

RobertRosca commented Dec 12, 2023

View reviewed changes

Fix reprocess --no-sandbox flag negation

8a7e81b

takluyver reviewed Dec 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bubblewrap implementation for sandboxing #153

Add Bubblewrap implementation for sandboxing #153

RobertRosca commented Dec 10, 2023 •

edited

Loading

RobertRosca commented Dec 12, 2023 •

edited

Loading

JamesWrigley left a comment

JamesWrigley Dec 12, 2023

RobertRosca Dec 12, 2023

JamesWrigley Dec 21, 2023

JamesWrigley Dec 12, 2023

RobertRosca Dec 12, 2023

RobertRosca left a comment

RobertRosca Dec 12, 2023

RobertRosca Dec 12, 2023

RobertRosca Dec 12, 2023

takluyver Dec 21, 2023

takluyver Dec 21, 2023

	with contextlib.suppress(Exception):
	with contextlib.suppress(FileNotFoundError):

Add Bubblewrap implementation for sandboxing #153

Are you sure you want to change the base?

Add Bubblewrap implementation for sandboxing #153

Conversation

RobertRosca commented Dec 10, 2023 • edited Loading

Summary

Example

Questions

RobertRosca commented Dec 12, 2023 • edited Loading

JamesWrigley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobertRosca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobertRosca commented Dec 10, 2023 •

edited

Loading

RobertRosca commented Dec 12, 2023 •

edited

Loading