LINUX SECCOMP模块介绍

编程入门 行业动态 更新时间:2024-10-22 12:26:21

LINUX SECCOMP<a href=https://www.elefans.com/category/jswz/34/1771428.html style=模块介绍"/>

LINUX SECCOMP模块介绍

目录

SECCOMP介绍

SECCOMP-BPF

seccomp与capabilities的区别

SECCOMP在DOCKER中应用

关闭seccomp

关闭seccomp导致的安全问题

参考


SECCOMP介绍

Seccomp是 "secure computing" 的 缩写。Linux内核2.6.12版本(2005年3月8日)引入。是linux一个安全模块,用于限制程序系统调用;当时如果使用了SECCOMP只允许4个系统调用:

read,write,_exit,sigreturn

我们来看下例子

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>void configure_seccomp() {printf("Configuring seccomp\n");prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
}int main(int argc, char* argv[]) {int infd, outfd;if (argc < 3) {printf("Usage:\n\t%s <input path> <output_path>\n", argv[0]);return -1;}printf("Starting test seccomp Y/N?");char c = getchar();if (c == 'y' || c == 'Y') configure_seccomp();printf("Opening '%s' for reading\n", argv[1]);if ((infd = open(argv[1], O_RDONLY)) > 0) {ssize_t read_bytes;char buffer[1024];printf("Opening '%s' for writing\n", argv[2]);if ((outfd = open(argv[2], O_WRONLY | O_CREAT, 0644)) > 0) {while ((read_bytes = read(infd, &buffer, 1024)) > 0)write(outfd, &buffer, (ssize_t)read_bytes);}close(infd);close(outfd);}printf("End!\n");return 0;
}

使用下列命令编译

gcc seccomp.cpp -o seccomp

使用下列命令运行程序后,我们使用N不启用SECCOMP,发现将in.txt拷贝到了out.txt,说明拷贝成功。 

我们如果使用Y,启用了SECCOMP,得到的结果如下所示,程序执行到25行,open文件时被Kill,也就是在SECCOMP模式下,我们运行了除了上面描述的

SECCOMP-BPF

Linux 3.5内核版本中, 引入seccomp第二种匹配模式:SECCOMP_MODE_FILTER。(以下Seccomp-BPF皆指seccomp的过滤模式)

而在该模式下,进程可以指定允许哪些系统调用,而不是像最开始的限制到4个系统调用中。过滤模式是通过使用Berkeley的数据包过滤器做过滤规则匹配,也就是这里的BPF。使用了seccomp-BPF的程序,必须具有此CAP_SYS_ADMIN权限;或者通过使用prctrl把no_new_priv设置bit 位设置成1

Seccomp与Capabilities的区别

两个都是安全方案,而seccomp对syscall调用限制,capability是进程权限集合,一个capability是权限的集合(root权限作为组,然后做了更细的划分),seccomp在capability前校验(有待校验)

capabilities一共 限制了39个系统能力:

CAP_AUDIT_CONTROL (since Linux 2.6.11)
CAP_AUDIT_READ (since Linux 3.16)
CAP_AUDIT_WRITE (since Linux 2.6.11)
CAP_BLOCK_SUSPEND (since Linux 3.5)
CAP_BPF (since Linux 5.8)
CAP_CHECKPOINT_RESTORE (since Linux 5.9)
CAP_CHOWN
CAP_DAC_OVERRIDE
CAP_DAC_READ_SEARCH
CAP_FOWNER
CAP_FSETID
CAP_IPC_LOCK
CAP_IPC_OWNER
CAP_KILL
CAP_LEASE (since Linux 2.4)
CAP_LINUX_IMMUTABLE
CAP_MAC_ADMIN (since Linux 2.6.25)
CAP_MAC_OVERRIDE (since Linux 2.6.25)
CAP_MKNOD (since Linux 2.4)
CAP_NET_ADMIN
CAP_NET_BIND_SERVICE
CAP_NET_BROADCAST
CAP_NET_RAW
CAP_PERFMON (since Linux 5.8)
CAP_SETGID
CAP_SETFCAP (since Linux 2.6.24)
CAP_SETPCAP
CAP_SETUID
CAP_SYS_ADMIN
CAP_SYS_BOOT
CAP_SYS_CHROOT
CAP_SYS_MODULE
CAP_SYS_NICE
CAP_SYS_PACCT
CAP_SYS_PTRACE
CAP_SYS_RAWIO
CAP_SYS_RESOURCE
CAP_SYSLOG (since Linux 2.6.37)
CAP_WAKE_ALARM (since Linux 3.0)

Seccomp是对系统接口的限制,也就是系统接口有多少个,Seccomp就能管理多少个。查看上面提到的unistd_64.h头文件,一共有427个(不同的Linux版本会有差异):

#define __NR_statx 332
#define __NR_io_pgetevents 333
#define __NR_rseq 334
#define __NR_io_uring_setup 425
#define __NR_io_uring_enter 426
#define __NR_io_uring_register 427#endif /* _ASM_X86_UNISTD_64_H */

容器中seccomp的使用

容器中 seccomp的使用,本质是对Seccomp-BPF的再封装使用;通过简单的配置文件来达快速设置多个容器的seccomp安全应用(以下全部以docker为例)。

docker中,通过配置一个profile.json文件来告知容器需要限制的系统 API,比如:

{"defaultAction": "SCMP_ACT_ALLOW","syscalls": [{"name": "mkdir","action": "SCMP_ACT_ERRNO","args": []}]
}

在这个配置文件中,默认情况下允许容器执行除“ mkdir”以外的全部系统调用。如 图:在容器内执行“ mkdir /home/test”生成新目录失败

而docker默认加载的seccomp配置内容在github上可以查看:.json

配置文件里面禁用了40+的系统调用,允许了300+的系统调用。 有点黑白名单的意思。

SECCOMP在DOCKER中应用

可以在下图看到,docker使用了SECCOMP禁用了44个SYSCALL

以下syscall已被docker默认的seccomp禁用,我们可以看到reboot被禁用,也就是docker中不能重启机器 

SyscallDescription
acctAccounting syscall which could let containers disable their own resource limits or process accounting. Also gated by CAP_SYS_PACCT.
add_keyPrevent containers from using the kernel keyring, which is not namespaced.
bpfDeny loading potentially persistent bpf programs into kernel, already gated by CAP_SYS_ADMIN.
clock_adjtimeTime/date is not namespaced. Also gated by CAP_SYS_TIME.
clock_settimeTime/date is not namespaced. Also gated by CAP_SYS_TIME.
cloneDeny cloning new namespaces. Also gated by CAP_SYS_ADMIN for CLONE_* flags, except CLONE_NEWUSER.
create_moduleDeny manipulation and functions on kernel modules. Obsolete. Also gated by CAP_SYS_MODULE.
delete_moduleDeny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
finit_moduleDeny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
get_kernel_symsDeny retrieval of exported kernel and module symbols. Obsolete.
get_mempolicySyscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
init_moduleDeny manipulation and functions on kernel modules. Also gated by CAP_SYS_MODULE.
iopermPrevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO.
ioplPrevent containers from modifying kernel I/O privilege levels. Already gated by CAP_SYS_RAWIO.
kcmpRestrict process inspection capabilities, already blocked by dropping CAP_SYS_PTRACE.
kexec_file_loadSister syscall of kexec_load that does the same thing, slightly different arguments. Also gated by CAP_SYS_BOOT.
kexec_loadDeny loading a new kernel for later execution. Also gated by CAP_SYS_BOOT.
keyctlPrevent containers from using the kernel keyring, which is not namespaced.
lookup_dcookieTracing/profiling syscall, which could leak a lot of information on the host. Also gated by CAP_SYS_ADMIN.
mbindSyscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
mountDeny mounting, already gated by CAP_SYS_ADMIN.
move_pagesSyscall that modifies kernel memory and NUMA settings.
name_to_handle_atSister syscall to open_by_handle_at. Already gated by CAP_DAC_READ_SEARCH.
nfsservctlDeny interaction with the kernel nfs daemon. Obsolete since Linux 3.1.
open_by_handle_atCause of an old container breakout. Also gated by CAP_DAC_READ_SEARCH.
perf_event_openTracing/profiling syscall, which could leak a lot of information on the host.
personalityPrevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns.
pivot_rootDeny pivot_root, should be privileged operation.
process_vm_readvRestrict process inspection capabilities, already blocked by dropping CAP_SYS_PTRACE.
process_vm_writevRestrict process inspection capabilities, already blocked by dropping CAP_SYS_PTRACE.
ptraceTracing/profiling syscall. Blocked in Linux kernel versions before 4.8 to avoid seccomp bypass. Tracing/profiling arbitrary processes is already blocked by dropping CAP_SYS_PTRACE, because it could leak a lot of information on the host.
query_moduleDeny manipulation and functions on kernel modules. Obsolete.
quotactlQuota syscall which could let containers disable their own resource limits or process accounting. Also gated by CAP_SYS_ADMIN.
rebootDon’t let containers reboot the host. Also gated by CAP_SYS_BOOT.
request_keyPrevent containers from using the kernel keyring, which is not namespaced.
set_mempolicySyscall that modifies kernel memory and NUMA settings. Already gated by CAP_SYS_NICE.
setnsDeny associating a thread with a namespace. Also gated by CAP_SYS_ADMIN.
settimeofdayTime/date is not namespaced. Also gated by CAP_SYS_TIME.
stimeTime/date is not namespaced. Also gated by CAP_SYS_TIME.
swaponDeny start/stop swapping to file/device. Also gated by CAP_SYS_ADMIN.
swapoffDeny start/stop swapping to file/device. Also gated by CAP_SYS_ADMIN.
sysfsObsolete syscall.
_sysctlObsolete, replaced by /proc/sys.
umountShould be a privileged operation. Also gated by CAP_SYS_ADMIN.
umount2Should be a privileged operation. Also gated by CAP_SYS_ADMIN.
unshareDeny cloning new namespaces for processes. Also gated by CAP_SYS_ADMIN, with the exception of unshare --user.
uselibOlder syscall related to shared libraries, unused for a long time.
userfaultfdUserspace page fault handling, largely needed for process migration.
ustatObsolete syscall.
vm86In kernel x86 real mode virtual machine. Also gated by CAP_SYS_ADMIN.
vm86oldIn kernel x86 real mode virtual machine. Also gated by CAP_SYS_ADMIN.

我们增加reboot后,我们来看看是否可以在容器中重启

docker run --rm \-it \--security-opt seccomp=/home/profile.json \hello-world

关闭seccomp

docker run -it --security-opt seccomp=unconfined ubuntu:latest

关闭seccomp导致的安全问题

关闭seccomp会增大docker攻击面,在默认情况下禁用了部分syscall,而这些syscall如果开启会增大攻击面,因为当有这样的syscall,就增多了一种攻击面(增多了一个系统调用路径,例如该调用存在溢出漏洞),举个例子,CVE-2022-0185就是这样一个漏洞,通过unshare系统(unshare -Urm)调用拿到sys_admin权限,通过unshare增加了进程的capabiltiy,实验如下

sh-3.2# docker run -it --security-opt seccomp=unconfined centos:latest
[root@93bb4e20b766 /]# 
[root@93bb4e20b766 /]# capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0secure-noroot: no (unlocked)secure-no-suid-fixup: no (unlocked)secure-keep-caps: no (unlocked)secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=0(root)[root@93bb4e20b766 /]# unshare -Urm
[root@93bb4e20b766 /]# capsh --print
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39,40+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39,40
Ambient set =
Securebits: 00/0x0/1'b0secure-noroot: no (unlocked)secure-no-suid-fixup: no (unlocked)secure-keep-caps: no (unlocked)secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=0(root)
[root@93bb4e20b766 /]#

具体可以参考文章 CVE-2022-0185 价值$3w的 File System Context 内核整数溢出漏洞利用分析_bsauce的博客-CSDN博客

参考

Introduction to seccomp: BPF linux syscall filter - tycoon3 - 博客园 (cnblogs)

浅谈Linux SECCOMP安全机制在容器中的使用 - 腾讯云开发者社区-腾讯云 (tencent)

The Route to Host:从内核提权到容器逃逸 – 绿盟科技技术博客 (nsfocus)

Seccomp、BPF与容器安全 - 先知社区 (aliyun)

探究K8S v1.19 GA的Seccomp - 知乎 (zhihu)

云原生安全 — seccomp应用最佳实践-阿里云开发者社区 (aliyun)

Seccomp security profiles for Docker | Docker Documentation

Restrict a Container's Syscalls with seccomp | Kubernetes

seccomp - Wikipedia

/

容器安全之CVE-2022-0185_新闻中心-网盾网络安全培训中心

capabilities - Difference between linux capabities and seccomp - Information Security Stack Exchange

云原生安全 — seccomp应用最佳实践-阿里云开发者社区

CVE-2022-0185 价值$3w的 File System Context 内核整数溢出漏洞利用分析_bsauce的博客-CSDN博客

更多推荐

LINUX SECCOMP模块介绍

本文发布于:2024-02-26 10:36:53,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1702198.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:模块   LINUX   SECCOMP

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!