January 22, 2016
Containers are lightweight virtualization tools that give the illusion of separation and isolation to processes. They are not a security technology, but they do offer some isolation like filesystem operations and network operations, using Linux namespaces. However, as more containers are deployed we continue to find problems that need to be addressed. Among them, resource accounting and container privileges are top culprits.
For now we will give you a quick overview over Linux namespaces. In part 2 of our blog post we will go deeper into user namespaces and current problems that Linux containers face today.
Introduction to namespaces
In Linux, containers are implemented using different namespace technologies. This blog post will present them. Later we will go a bit deeper in user namespaces and how containers may take advantage of them.
Linux offers multiple namespaces, usually they are created using the clone() system call. So we have:
Mount namespaces isolate the filesystem hierarchy a group of processes may see. Each group of processes with its own mount namespace will see a different set of mount points, this allows to restrict and isolate filesystem data and operations. The system calls ‘mount(2)’ and ‘umount(2)#8217; will only operate on the mount namespace of the calling process which will affect only processes that live in that same mount namespace.
_PID namespaces_ isolate the process ID number. PIDs in a new namespace start at 1 as if they were in a standalone system and are unique inside that PID namespace. PID 1 of any PID namespace is considered to be the “init” process of that namespace, and orphaned children in a PID namespace will always be reattached to PID 1 of the same containing PID namespace.
UTS namespaces isolate the system identifiers nodename and domainname. This allows UTS namespaces to have their own hostname and NIS domain name.
IPC namespaces isolate some interprocess communication resources and objects like System V IPC objects and POSIX message queues. Each IPC namespace will have its own System V IPC identifiers that are visible to all processes of that same namespace.
Network namespaces isolate network resources. Each network namespace will have its own IP addresses, routing tables, devices, etc. Applications in different network namespaces can bound to port 80 in their own namespace, allowing to have multiple web servers.
User namespaces isolate the user and group IDs, making them appear different inside and outside the user namespace. They also operate on capabilities and privileges which may allow to separate them. Other use cases may allow an unprivileged user to gain more privileged capabilities inside a user namespace. In the blog posts we will focus on setting up user namespaces through a privileged container manager.
The figure below gives an overview of user namespaces, for detailed information we recommend this excellent lwn write-up “Namespaces in operation, part 5: User namespaces”.
The figure above shows a specific setup of a user namespace hierarchy. The first user namespace which is the initial user namespace of the host maps all the User ID (UID) range. The namespace level 1 which could be the container user namespace maps UIDs 1000-5999 of level 0 to 0-4999 inside its own namespace. Level 2 namespace maps the UID range 1000-1499 of level 1 into 0-499 inside its own namespace, and so on.