Kafel is a language and library for specifying syscall filtering policies. The policies are compiled into BPF code that can be used with seccomp-filter.
This is NOT an official Google product.
struct sock_fprog prog;
kafel_ctxt_t ctxt = kafel_ctxt_create();
kafel_set_input_string(ctxt, seccomp_policy);
if (kafel_compile(ctxt, &prog)) {
fprintf(stderr, "policy compilation failed: %s", kafel_error_msg(ctxt));
kafel_ctxt_destroy(&ctxt);
exit(-1);
}
kafel_ctxt_destroy(&ctxt);
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0);
free(prog.filter);
struct sock_fprog prog;
if (kafel_compile_string(seccomp_policy, &prog)) {
fputs("policy compilation failed", stderr);
exit(-1);
}
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0);
free(prog.filter);
A simple language is used to define policies.
A policy file has 3 parts:
- Constant definitions (optional)
- Policy definitions
- Top level policy declaration
Kafel supports following number notations:
- Decimal
42
- Hexadecimal
0xfa1
- Octal
0777
- Binary
0b10101
You may define numeric constants at the beging of policy file to make it more readable. The defined constants can then be used anywhere where a number is expected.
#define MYCONST 123
Policy definition is a list of action blocks and use statements separated by commas.
samples/ contains some example policies that demonstrate supported features.
A USE someOtherPolicy
behaves as if someOtherPolicy
body was pasted in its
place. You may only use policies defined before the use statement.
With use statements you can create meaningful groups of filtering rules that are building blocks of bigger policies.
Action block consist of a target and list of syscall matching rules separated with commas.
Target of first rule matched is the policy decision.
Following table list Kafel targets and their corresponding seccomp-filter return values.
Kafel | seccomp-filter |
---|---|
ALLOW |
SECCOMP_RET_ALLOW |
LOG |
SECCOMP_RET_LOG |
KILL , DENY |
SECCOMP_RET_KILL |
ERRNO(number) |
SECCOMP_RET_ERRNO+number |
TRAP(number) |
SECCOMP_RET_TRAP+number |
TRACE(number) |
SECCOMP_RET_TRACE+number |
A rules consist of syscall name and optional list of boolean expressions.
List of boolean expressions separated by commas.
A comma is semantically equivalent to ||
but has the lowest precedence,
therefore it may be easier to read.
Normally syscalls are specified by their names as defined in Linux kernel.
However, you may also filter custom syscalls that are not in the standard
syscall list.
You can either define a constant and use it in place of syscall name or
utilize SYSCALL
keyword.
#define mysyscall -1
POLICY my_const {
ALLOW {
mysyscall
}
}
POLICY my_literal {
ALLOW {
SYSCALL[-1]
}
}
Boolean expressions are used to filter syscalls based on their arguments. A expression resembles C language syntax, except that there are no arithmetic operators.
some_syscall(first_arg, my_arg_name) { first_arg == 42 && my_arg_name != 42 }
Bitwise and (&
) operator can be used to test for flags.
mmap { (prot & PROT_EXEC) == 0 }
You don't have to declare arguments for well-known syscalls but can just use
their regular names as specified in Linux kernel and man
pages.
write { fd == 1 }
USE topLevel DEFAULT the_action
Specifies that topLevel
policy is compiled and action the_action
should be
taken when no rule matches.
When used with nsjail, the following command allows to create a fairly constrained environment for your shell
$ ./nsjail --chroot / --seccomp_string 'POLICY a { ALLOW { write, execve, brk, access, mmap, open, newfstat, close, read, mprotect, arch_prctl, munmap, getuid, getgid, getpid, rt_sigaction, geteuid, getppid, getcwd, getegid, ioctl, fcntl, newstat, clone, wait4, rt_sigreturn, exit_group } } USE a DEFAULT KILL' -- /bin/sh -i
[2017-01-15T21:53:08+0100] Mode: STANDALONE_ONCE
[2017-01-15T21:53:08+0100] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'/bin/sh', bind:[::]:0, max_conns_per_ip:0, uid:(ns:1000, global:1000), gid:(ns:1000, global:1000), time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clonew_newuts:true, clone_newcgroup:false, keep_caps:false, tmpfs_size:4194304, disable_no_new_privs:false, pivot_root_only:false
[2017-01-15T21:53:08+0100] Mount point: src:'/' dst:'/' type:'' flags:0x5001 options:''
[2017-01-15T21:53:08+0100] Mount point: src:'(null)' dst:'/proc' type:'proc' flags:0x0 options:''
[2017-01-15T21:53:08+0100] PID: 18873 about to execute '/bin/sh' for [STANDALONE_MODE]
/bin/sh: 0: can't access tty; job control turned off
$ set
IFS='
'
OPTIND='1'
PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
PPID='0'
PS1='$ '
PS2='> '
PS4='+ '
PWD='/'
$ id
Bad system call
$ exit
[2017-01-15T21:53:17+0100] PID: 18873 exited with status: 159, (PIDs left: 0)