run
command for the Debian container:
--pid=container:apache
In other words, we have full access to the apache
container's process table from inside the Debian container.
Now try the following commands to see if we have access to the filesystem of the apache
container:
root@0237e1ebcc85: cd /proc/1/root root@0237e1ebcc85: ls bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
There is nothing too unusual from that directory listing. However, you might be surprised to read that what we can see is actually the top level of the Apache container filesystem and not the Debian container's. Proof of this can be found by using this path in the following ls
command:
root@0237e1ebcc85: ls usr/local/apache2/htdocs usr/local/apache2/htdocs/index.html
As suspected, there's an HTML file sitting within the apache2
directory:
root@0237e1ebcc85:/proc/1/root# cat usr/local/apache2/htdocs/index.html <html><body><h1>It works!</h1></body></html>
We have proven that we have visibility of the Apache container's process table and its filesystem. Next, we will see what access this switch offers us:--net=container:apache.
Still inside the Debian container we will run this command:
root@0237e1ebcc85:/proc/1/root# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0 valid_lft forever preferred_lft forever
The slightly abbreviated output from the ip a
command offers us two network interfaces, lo
for the loopback interface and eth0
, which has the IP address 172.17.0.2/16.
Let's exit the Debian container by pressing Ctrl+D and return to our normal system prompt to run a quick test. We named the container apache
, so using the following inspect
command we can view the end of the output to get the IP address for the Apache container:
$ docker inspect apache | tail -20
Listing 1.2 shows slightly abbreviated output from that command, and lo and behold in the IP Address
section we can see the same IP address we saw from within the Debian container a moment ago, as shown in Listing 1.2: "IPAddress": "172.17.0.2"
.
Listing 1.2: The External View of the Apache Container's Network Stack
"Networks": { "bridge": { "IPAMConfig": null, "Links": null, "Aliases": null, "NetworkID": […snip…] "Gateway": "172.17.0.1", "IPAddress": "172.17.0.2", "IPPrefixLen": 16, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:11:00:02", "DriverOpts": null } } } } ]
Head back into the Debian container now with the same command as earlier, shown here:
$ docker run --rm -it --name debian --pid=container:apache \--net=container:apache --cap-add sys_admin debian:latest
To prove that the networking is fully passed across to the Debian container from the Apache container, we will install the curl
command inside the container:
root@0237e1ebcc85:/# apt update; apt install curl -y
After a little patience (if you've stopped the Debian container, you'll need to run apt update
before the curl
command for it to work; otherwise, you can ignore it) we can now check what the intertwined network stack means from an internal container perspective with this command:
root@0237e1ebcc85:/# curl -v http://localhost:80 <html><body><h1>It works!</h1></body></html>
And, not straight from the filesystem this time but served over the network using TCP port 80, we see the HTML file saying, “It works!”
As we have been able to demonstrate, a Linux system does not need much encouragement to offer visibility between containers and across all the major components of a container. These examples should offer an insight into how containers reside on a host and how easy it is to potentially open security holes between containerized workloads.
Again, because containers are definitely not the same as virtual machines, security differs greatly and needs to be paid close attention to. If a container is run with excessive privileges or punches holes through the security protection offered by kernel capabilities, then not only are other containers at serious risk but the host machine itself is too. A sample of the key concerns of a “container escape” where it is possible to “break out” of a host's relatively standard security controls includes the following:
Disrupting services on any or all containers on the host, causing outages
Attacking the underlying host to cause a denial of service by causing a stress event with a view to exhaust available resources, whether that be RAM, CPU, disk space capacity, or I/O, for example
Deleting data on any locally mounting volumes directly on the host machine or wiping critical host directories that cause system failure
Embedding processes on a host that may act as a form of advanced persistent threat (APT), which could lie dormant for a period of time before being taken advantage of at a later date
Other Containers
A little-known fact is that serverless technologies also embrace containerization, or more accurately lightweight virtualization when it comes to AWS Lambda. Making use of KVM as mentioned earlier, AWS uses Firecracker to provide what it calls MicroVMs. When launched, AWS explicitly stated that security was its top priority and ensured that multiple levels of isolation were introduced to provide defense in depth. From a performance perspective, remarkably the MicroVMs can apparently start up in about an eighth of a second. An active Open Source project, Firecracker is an intriguing technology:
github.com/firecracker-microvm/firecracker
As mentioned earlier, the security model is a familiar one, according to the AWS site: “The Firecracker process is jailed using cgroups and seccomp BPF, and has access to a small, tightly controlled list of system calls.”
Apparently, at least according to this page on the AWS forums (forums.aws.amazon.com/thread.jspa?threadID=263968
), there are restrictions applied to the containerized service such as limitations on varying kernel capabilities. These are dropped for security purposes and might include various syscalls like PTRACE, which allow the monitoring of and potentially the control of other processes. Other more obvious services, such as SMTP, are disallowed to prevent spam from leaving a function. And removing the ability to use the CAP_NET_RAW
capability makes it impossible to spoof IP addresses or use raw sockets for capturing traffic.
Another approach to running containers in a more secure fashion is to lean on hardware virtualization to a greater degree. One of the earlier pioneers of containerization was CoreOS (known for a number of other products, such as etcd
, which is prevalent in most modern Kubernetes distributions). They created a container runtime called rkt
(which was pronounced “rock-it”), that is sadly now deprecated. The approach from rkt
was to make use of KVM as a hypervisor. The premise (explained at coreos.com/rkt/docs/latest/running-kvm-stage1.html
) was to use KVM, which provides efficient hardware-level virtualization, to spawn containers rather than systemd-nspawn
(wiki.debian.org/nspawn
), which can create a slim