Limiting available system calls
Effective software security is best done in layers; if an attacker is able to penetrate one layer they encounter another. An attack strategy becoming more common is to attack the kernel itself; if one succeeds in injecting code into kernel space it is game over.
As an example of this, Ang Cui and Salvatore Solfo from Columbia University discovered vulnerabilities in Cisco phones which allowed them to inject arbitrary code into kernel memory. They discovered the necessary vulnerabilities by fuzzing the kernel by way of fuzzing system calls.
Not all processes in a system need to be able to make all possible system calls. Limiting the system calls possible has two positive outcomes:
- an attacker who exploits a vulnerability is able to do less with the highjacked process
- fewer system calls are exposed, limiting access to potential vulnerabilities in the kernel
One way to limit the system calls available to a process is by using seccomp, short for Secure Computing Mode. Seccomp is a mechanism in the Linux kernel which allows a process to make a one-way transition to a secure mode where only exit(), sigreturn(), read(), and write() on file descriptors already opened can be made. Any other system call attempted will result in killing the process with a SIGKILL or SIGSYS.
An extension of this, proposed in 2012, is called seccomp-bpf, which allows the filtering of system calls using Berkeley Packet Filter rules. It allows a process to more finely control which system calls can be used, in addition to checking the arguments passed to the system calls.
Seccomp-bpf on ffmpeg
The seccomp-bpf technique is of interest, as it provides a flexible means to specify allowed system calls on a per-process level. Past posts used ffmpeg to compare performance of various security feature, and this post will do so as well.
Through stracing ffmpeg all the system calls it uses to encode an example
video were found. The following are modifications to ffmpeg which
limit the available system calls to the minimum possible to successfully
do its job. The list of allowed calls is in the sock_filter filter
structure.
Note that this syntax is adopted from
this example.
Configuring the list of allowed calls in BPF notation is rather cumbersome. There are a few other alternatives which are simpler. First, libseccomp provides a function-call based approach to setting up a filter. For example:
If using systemd and the process is defined by a service, the .service file can define which system calls to allow with the SystemCallFilter option. As there are many system calls, systemd further defines several sets of calls. For example, the following configuration should be sufficient if running ffmpeg were a service:
Performance cost
Adding checks for allowed system calls will result in some performance cost, as running the filters will take a non-zero amount of time to execute. However, the question is can the performance cost be measured or is it exceedingly small?
To quantify the performance impact, two experiments were conducted. The first encoded a small video 20 times in succession using FFmpeg, once with and once without the filter installed mentioned in the diff earlier. See this post for details on the experiment and the video file which was used.
The results from the first experiment are shown in the following two box plots (raw data here).
The results do not show an increase in the time necessary to encode the example file. The reduction in the mean may be a result of a reduced number of samples in the experiment.
Unsatisfied with this result, a second experiment was run. In this experiment, the same filters used in ffmpeg were optionally installed, then the following system calls were attempted in a loop for 100,000 iterations:
The duration of the resulting program was captured 100 times, both with and without the filters installed. The following two box plots show the result of this experiment:
This more clearly shows that there is a performance impact from using the filters. On average there was a 0.58 second increase, which is 5.84 microseconds per system call. This is an overhead of ~44% per call. However, as the much larger experiment using ffmpeg did not show a noticeable difference I must conclude that overall time of executing the system calls far exceeds the added overhead.
Conclusion
Enabling filters to prevent processes from executing some system calls is a viable way to sandbox to a limited extent. Making changes on a per-process basis may be cumbersome, as the source would need to be updated for each. If the system uses systemd and the processes in question are services it is easier to define the system call limits using configuration files.
The overhead for an simple system call may seem high, however the overhead is easily hidden when the system calls take longer to execute. If one can determine what system calls are valid for an application, using seccomp-bpf is a good approach for limiting one’s risk and exposure.