Canonical Voices

Posts tagged with 'lxc'

Stéphane Graber

After 10 months of work, over a thousand contributions by 60 or so contributors, we’ve finally released LXC 1.0!

You may have followed my earlier series of blog post on LXC 1.0, well, everything I described in there is now available in a stable release which we intend to support for a long time.

In the immediate future, I expect most of LXC upstream will focus on dealing with the bug reports and questions which will no doubt follow this release, then we’ll have to discuss what our goals for LXC 1.1 are and setup a longer term roadmap to LXC 2.0.

But right now, I’m just happy to have LXC 1.0 out, get a lot more users to play with new technologies like unprivileged containers and play with our API in the various languages we support.

Thanks to everyone who made this possible!

Read more
Stéphane Graber

This is post 10 out of 10 in the LXC 1.0 blog post series.

Logging

Most LXC commands take two options:

  • -o, –logfile=FILE: Location of the logfile (defaults to stder)
  • -l, –logpriority=LEVEL: Log priority (defaults to ERROR)

The valid log priorities are:

  • FATAL
  • ALERT
  • CRIT
  • ERROR
  • WARN
  • NOTICE
  • INFO
  • DEBUG
  • TRACE

FATAL, ALERT and CRIT are mostly unused at this time, ERROR is pretty common and so are the others except for TRACE. If you want to see all possible log entries, set the log priority to TRACE.

There are also two matching configuration options which you can put in your container’s configuration:

  • lxc.logfile
  • lxc.loglevel

They behave exactly like their command like counterparts. However note that if the command line options are passed, any value set in the configuration will be ignored and instead will be overridden by those passed by the user.

When reporting a bug against LXC, it’s usually a good idea to attach a log of the container’s action with a logpriority of at least DEBUG.

API debugging

When debugging a problem using the API it’s often a good idea to try and re-implement the failing bit of code in C using liblxc directly, that helps get the binding out of the way and usually leads to cleaner stack traces and easier bug reports.

It’s also useful to set lxc.loglevel to DEBUG using set_config_item on your container so you can get a log of what LXC is doing.

Testing

Before digging to deep into an issue with the code you are working on, it’s usually a good idea to make sure that LXC itself is behaving as it should on your machine.

To check that, run “lxc-checkconfig” and look for any missing kernel feature, if all looks good, then install (or build) the tests. In Ubuntu, those are shipped in a separate “lxc-tests” package. Most of those tests are expecting to be run on an Ubuntu system (patches welcome…) but should do fine on any distro that’s compatible with the lxc-ubuntu template.

Run each of the lxc-test-* binaries as root and note any failure. Note that it’s possible that they leave some cruft behind on failure, if so, please cleanup any of those leftover containers before processing to the next test as unfortunately that cruft may cause failure by itself.

Reporting bugs

The primary LXC bug tracker is available at: https://github.com/lxc/lxc/issues

You may also report bugs directly through the distributions (though it’s often preferred to still file an upstream bug and then link the two), for example for Ubuntu, LXC bugs are tracked here: https://bugs.launchpad.net/ubuntu/+source/lxc

If you’ve already done some work tracking down the bug, you may also directly contact us on our mailing-list (see below).

Sending patches

We always welcome contributions and are very happy to have such an active development community around LXC (Over 60 people contributed to LXC 1.0). We don’t have many rules governing contributions, we just ask that your contributions be properly licensed and that you own the copyright on the code you are sending us (and indicate so by putting a Signed-off-by line in your commit).

As for the licensing, anything which ends up in the library (liblxc) or its bindings must be LGPLv2.1+ or compatible with it and not adding any additional restriction. Standalone binaries and scripts can either be LGLPv2.1+ (the project default) or GPLv2. If unsure, LGPLv2.1+ is usually a safe bet for any new file in LXC.

Patches may be sent using two different ways:

  • Inline to the lxc-devel@lists.linuxcontainers.org (using git send-email or similar)
  • Using a pull request on github (we will then grab the .patch URL and treat it as if they were e-mails sent to our list)

Getting in touch with us

The primary way of contacting the upstream LXC team is through our mailing-lists. We have two, one for LXC development and one for LXC users questions:

For more real-time discussion, you can also find a lot of LXC users and most of the developers in #lxcontainers on irc.freenode.net.

Final notes

So this is my final blog post before LXC 1.0 is finally released. We’re currently at rc3 with an rc4 coming a bit later today and a final release scheduled for tomorrow evening or Thursday morning.

I hope you have enjoyed this blog post series and that it’ll be a useful reference for people deploying LXC 1.0.

Read more
Stéphane Graber

This is post 9 out of 10 in the LXC 1.0 blog post series.

Some starting notes

This post uses unprivileged containers, this isn’t an hard requirement but makes a lot of sense for GUI applications. Besides, since you followed the whole series, you have those setup anyway, right?

I’ll be using Google Chrome with the Google Talk and Adobe Flash plugins as “hostile” piece of software that I do not wish to allow to run directly on my machine.
Here are a few reasons why:

  • Those are binaries I don’t have source for.
    (That alone is usually enough for me to put an application in a container)
  • Comes from an external (non-Ubuntu) repository which may then push anything they wish to my system.
    (Could be restricted with careful apt pining)
  • Installs a daily cronjob which will re-add said repository and GPG keys should I for some reason choose to remove them.
    (Apparently done to have the repository re-added after dist-upgrades)
  • Uses a setuid wrapper to setup their sandbox. Understandable as they switch namespaces and user namespaces aren’t widespread yet.
    (This can be worked around by turning off the sandbox. The code for the sandbox is also available from the chromium project, though there’s no proof that the binary build doesn’t have anything added to it)
  • I actually need to use those piece of software because of my work environment and because the web hasn’t entirely moved away from flash yet…

While what I’ll be describing below is a huge step up for security in my opinion, it’s still not ideal and a few compromises have to be made to make those software even work. Those are basically access to:

  • pulseaudio: possibly recording you
  • X11: possibly doing key logging or taking pictures of your screen
  • dri/snd devices: can’t think of anything that’s not already covered by the first two, but I’m sure someone with a better imagination than mine will come up with something

But there’s only so much you can do while still having the cool features, so the best you can do is make sure never to run the container while doing sensitive work.

Running Google chrome in a container

So now to the actual fun. The plan is rather simple, I want a simple container, running a stable, well supported, version of Ubuntu, in there I’ll install Google Chrome and any plugin I need, then integrate it with my desktop.

First of all, let’s get ourselves an Ubuntu 12.04 i386 container as that’s pretty well supported by most ISVs:

lxc-create -t download -n precise-gui -- -d ubuntu -r precise -a i386

Let’s tweak the configuration a bit by adding the following to ~/.local/share/lxc/precise-gui/config (replacing USERNAME appropriately):

lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file

lxc.hook.pre-start = /home/USERNAME/.local/share/lxc/precise-gui/setup-pulse.sh

Still in that same config file, you’ll have to replace your existing (or similar):

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

By something like (this assume your user’s uid/gid is 1000/1000):

lxc.id_map = u 0 100000 1000
lxc.id_map = g 0 100000 1000
lxc.id_map = u 1000 1000 1
lxc.id_map = g 1000 1000 1
lxc.id_map = u 1001 101001 64535
lxc.id_map = g 1001 101001 64535

So those mappings basically mean that your container has 65536 uids and gids mapped to it, starting at 0 up to 65535 in the container. Those are mapped to host ids 100000 to 165535 with one exception, uid and gid 1000 isn’t translated. That trick is needed so that your user in the container can access the X socket, pulseaudio socket and DRI/snd devices just as your own user can (this saves us a whole lot of configuration on the host).

Now that we’re done with the config file, let’s create that setup-pulse.sh script:

#!/bin/sh
PULSE_PATH=$LXC_ROOTFS_PATH/home/ubuntu/.pulse_socket

if [ ! -e "$PULSE_PATH" ] || [ -z "$(lsof -n $PULSE_PATH 2>&1)" ]; then
    pactl load-module module-native-protocol-unix auth-anonymous=1 \
        socket=$PULSE_PATH
fi

Make sure the file is executable or LXC will ignore it!
That script is fairly simple, it’ll simply tell pulseaudio on the host to bind /home/ubuntu/.pulse_socket in the container, checking that it’s not already setup.

Finally, run the following to fix the permissions of your container’s home directory:

sudo chown -R 1000:1000 ~/.local/share/lxc/precise-gui/rootfs/home/ubuntu

And that’s all that’s needed in the LXC side. So let’s start the container and install Google Chrome and the Google Talk plugin in there:

lxc-start -n precise-gui -d
lxc-attach -n precise-gui -- umount /tmp/.X11-unix
lxc-attach -n precise-gui -- apt-get update
lxc-attach -n precise-gui -- apt-get dist-upgrade -y
lxc-attach -n precise-gui -- apt-get install wget ubuntu-artwork dmz-cursor-theme ca-certificates pulseaudio -y
lxc-attach -n precise-gui -- wget https://dl.google.com/linux/direct/google-chrome-stable_current_i386.deb -O /tmp/chrome.deb
lxc-attach -n precise-gui -- wget https://dl.google.com/linux/direct/google-talkplugin_current_i386.deb -O /tmp/talk.deb
lxc-attach -n precise-gui -- dpkg -i /tmp/chrome.deb /tmp/talk.deb
lxc-attach -n precise-gui -- apt-get -f install -y
lxc-attach -n precise-gui -- sudo -u ubuntu mkdir -p /home/ubuntu/.pulse/
echo "disable-shm=yes" | lxc-attach -n precise-gui -- sudo -u ubuntu tee /home/ubuntu/.pulse/client.conf
lxc-stop -n precise-gui

At this point, everything you need is installed in the container.
To make your life easier, create the following launcher script, let’s call it “start-chrome” and put it in the container’s configuration directory (next to config and setup-pulse.sh):

#!/bin/sh
CONTAINER=precise-gui
CMD_LINE="google-chrome --disable-setuid-sandbox $*"

STARTED=false

if ! lxc-wait -n $CONTAINER -s RUNNING -t 0; then
    lxc-start -n $CONTAINER -d
    lxc-wait -n $CONTAINER -s RUNNING
    STARTED=true
fi

PULSE_SOCKET=/home/ubuntu/.pulse_socket

lxc-attach --clear-env -n $CONTAINER -- sudo -u ubuntu -i \
    env DISPLAY=$DISPLAY PULSE_SERVER=$PULSE_SOCKET $CMD_LINE

if [ "$STARTED" = "true" ]; then
    lxc-stop -n $CONTAINER -t 10
fi

Make sure the script is executable or the next step won’t work. This script will check if the container is running, if not, start it (and remember it did), then spawn google-chrome with the right environment set (and disabling its built-in sandbox as for some obscure reasons, it dislikes user namespaces), once google-chrome exits, the container is stopped.

To make things shinier, you may now also create ~/.local/share/applications/google-chrome.desktop containing:

[Desktop Entry]
Version=1.0
Name=Google Chrome
Comment=Access the Internet
Exec=/home/USERNAME/.local/share/lxc/precise-gui/start-chrome %U
Icon=/home/USERNAME/.local/share/lxc/precise-gui/rootfs/opt/google/chrome/product_logo_256.png
Type=Application
Categories=Network;WebBrowser;

Don’t forget to replace USERNAME to your own username so that both paths are valid.

And that’s it! You should now find a Google Chrome icon somewhere in your desktop environment (menu, dash, whatever…). Clicking on it will start Chrome which can be used pretty much as usual, when closed, the container will shutdown.
You may want to setup extra symlinks or bind-mount to make it easier to access things like the Downloads folder but that really depends on what you’re using the container for.

Chrome running in LXC

Obviously, the same process can be used for many different piece of software.

Skype

Quite a few people have contacted me asking about running Skype in that same container. I won’t give you a whole step by step guide as the one for Chrome cover 99% of what you need to do for Skype.

However there are two tricks you need to be aware of to get Skype to work properly:

  • Set “QT_X11_NO_MITSHM” to “1″
    (otherwise you get a blank window as it tries to use shared memory)
  • Set “GNOME_DESKTOP_SESSION_ID” to “this-is-deprecated”
    (otherwise you get an ugly Qt theme)

Those two should be added after the “env” in the launcher script you’ll write for Skype.

Apparently on some NVidia system, you may also need to set an additional environment variable (possibly useful not only for Skype):
LD_PRELOAD=/usr/lib/i386-linux-gnu/mesa/libGL.so.1

Steam

And finally, yet another commonly asked one, Steam.

That one actually doesn’t require anything extra in its environment, just grab the .deb, install it in the container, run an “apt-get -f install” to install any remaining dependency, create a launcher script and .desktop and you’re done.
I’ve been happily playing a few games (thanks to Valve giving those to all Ubuntu and Debian developers) without any problem so far.

Read more
Stéphane Graber

This is post 8 out of 10 in the LXC 1.0 blog post series.

The API

The first version of liblxc was introduced in LXC 0.9 but it was very much at an experimental state. LXC 1.0 however will ship with a much more complete API, covering all of LXC’s features. We’ve actually been rebasing all of our tools (lxc-*) to using that API rather than doing direct calls to the internal functions.

The API also comes with a whole set of tests which we run as part of our continuous integration setup and before distro uploads.

There are also quite a few bindings for those who don’t feel like writing C, we have lua and python3 bindings in-tree upstream and there are official out-of-tree bindings for Go and ruby.

The API documentation can be found at:
https://qa.linuxcontainers.org/master/current/doc/api/

It’s not necessarily the most readable API documentation ever and certainly could do with some examples, especially for the bindings, but it does cover all functions that are exported over the API. Any help improving our API documentation is very much welcome!

The basics

So let’s start with a very simple example of the LXC API using C, the following example will create a new container struct called “apicontainer”, create a root filesystem using the new download template, start the container, print its state and PID number, then attempt a clean shutdown before killing it.

#include <stdio.h>

#include <lxc/lxccontainer.h>

int main() {
    struct lxc_container *c;
    int ret = 1;

    /* Setup container struct */
    c = lxc_container_new("apicontainer", NULL);
    if (!c) {
        fprintf(stderr, "Failed to setup lxc_container struct\n");
        goto out;
    }

    if (c->is_defined(c)) {
        fprintf(stderr, "Container already exists\n");
        goto out;
    }

    /* Create the container */
    if (!c->createl(c, "download", NULL, NULL, LXC_CREATE_QUIET,
                    "-d", "ubuntu", "-r", "trusty", "-a", "i386", NULL)) {
        fprintf(stderr, "Failed to create container rootfs\n");
        goto out;
    }

    /* Start the container */
    if (!c->start(c, 0, NULL)) {
        fprintf(stderr, "Failed to start the container\n");
        goto out;
    }

    /* Query some information */
    printf("Container state: %s\n", c->state(c));
    printf("Container PID: %d\n", c->init_pid(c));

    /* Stop the container */
    if (!c->shutdown(c, 30)) {
        printf("Failed to cleanly shutdown the container, forcing.\n");
        if (!c->stop(c)) {
            fprintf(stderr, "Failed to kill the container.\n");
            goto out;
        }
    }

    /* Destroy the container */
    if (!c->destroy(c)) {
        fprintf(stderr, "Failed to destroy the container.\n");
        goto out;
    }

    ret = 0;
out:
    lxc_container_put(c);
    return ret;
}

So as you can see, it’s not very difficult to use, most functions are fairly straightforward and error checking is pretty simple (most calls are boolean and errors are printed to stderr by LXC depending on the loglevel).

Python3 scripting

As much fun as C may be, I usually like to script my containers and C isn’t really the best language for that. That’s why I wrote and maintain the official python3 binding.

The equivalent to the example above in python3 would be:

import lxc
import sys

# Setup the container object
c = lxc.Container("apicontainer")
if c.defined:
    print("Container already exists", file=sys.stderr)
    sys.exit(1)

# Create the container rootfs
if not c.create("download", lxc.LXC_CREATE_QUIET, {"dist": "ubuntu",
                                                   "release": "trusty",
                                                   "arch": "i386"}):
    print("Failed to create the container rootfs", file=sys.stderr)
    sys.exit(1)

# Start the container
if not c.start():
    print("Failed to start the container", file=sys.stderr)
    sys.exit(1)

# Query some information
print("Container state: %s" % c.state)
print("Container PID: %s" % c.init_pid)

# Stop the container
if not c.shutdown(30):
    print("Failed to cleanly shutdown the container, forcing.")
    if not c.stop():
        print("Failed to kill the container", file=sys.stderr)
        sys.exit(1)

# Destroy the container
if not c.destroy():
    print("Failed to destroy the container.", file=sys.stderr)
    sys.exit(1)

Now for that specific example, python3 isn’t that much simpler than the C equivalent.

But what if we wanted to do something slightly more tricky, like iterating through all existing containers, start them (if they’re not already started), wait for them to have network connectivity, then run updates and shut them down?

import lxc
import sys

for container in lxc.list_containers(as_object=True):
    # Start the container (if not started)
    started=False
    if not container.running:
        if not container.start():
            continue
        started=True

    if not container.state == "RUNNING":
        continue

    # Wait for connectivity
    if not container.get_ips(timeout=30):
        continue

    # Run the updates
    container.attach_wait(lxc.attach_run_command,
                          ["apt-get", "update"])
    container.attach_wait(lxc.attach_run_command,
                          ["apt-get", "dist-upgrade", "-y"])

    # Shutdown the container
    if started:
        if not container.shutdown(30):
            container.stop()

The most interesting bit in the example above is the attach_wait command, which basically lets your run a standard python function in the container’s namespaces, here’s a more obvious example:

import lxc

c = lxc.Container("p1")
if not c.running:
    c.start()

def print_hostname():
    with open("/etc/hostname", "r") as fd:
        print("Hostname: %s" % fd.read().strip())

# First run on the host
print_hostname()

# Then on the container
c.attach_wait(print_hostname)

if not c.shutdown(30):
    c.stop()

And the output of running the above:

stgraber@castiana:~$ python3 lxc-api.py
/home/stgraber/<frozen>:313: Warning: The python-lxc API isn't yet stable and may change at any point in the future.
Hostname: castiana
Hostname: p1

It may take you a little while to wrap your head around the possibilities offered by that function, especially as it also takes quite a few flags (look for LXC_ATTACH_* in the C API) which lets you control which namespaces to attach to, whether to have the function contained by apparmor, whether to bypass cgroup restrictions, …

That kind of flexibility is something you’ll never get with a virtual machine and the way it’s supported through our bindings makes it easier than ever to use by anyone who wants to automate custom workloads.

You can also use the API to script cloning containers and using snapshots (though for that example to work, you need current upstream master due to a small bug I found while writing this…):

import lxc
import os
import sys

if not os.geteuid() == 0:
    print("The use of overlayfs requires privileged containers.")
    sys.exit(1)

# Create a base container (if missing) using an Ubuntu 14.04 image
base = lxc.Container("base")
if not base.defined:
    base.create("download", lxc.LXC_CREATE_QUIET, {"dist": "ubuntu",
                                                   "release": "precise",
                                                   "arch": "i386"})

    # Customize it a bit
    base.start()
    base.get_ips(timeout=30)
    base.attach_wait(lxc.attach_run_command, ["apt-get", "update"])
    base.attach_wait(lxc.attach_run_command, ["apt-get", "dist-upgrade", "-y"])

    if not base.shutdown(30):
        base.stop()

# Clone it as web (if not already existing)
web = lxc.Container("web")
if not web.defined:
    # Clone base using an overlayfs overlay
    web = base.clone("web", bdevtype="overlayfs",
                     flags=lxc.LXC_CLONE_SNAPSHOT)

    # Install apache
    web.start()
    web.get_ips(timeout=30)
    web.attach_wait(lxc.attach_run_command, ["apt-get", "update"])
    web.attach_wait(lxc.attach_run_command, ["apt-get", "install",
                                             "apache2", "-y"])

    if not web.shutdown(30):
        web.stop()

# Create a website container based on the web container
mysite = web.clone("mysite", bdevtype="overlayfs",
                   flags=lxc.LXC_CLONE_SNAPSHOT)
mysite.start()
ips = mysite.get_ips(family="inet", timeout=30)
if ips:
    print("Website running at: http://%s" % ips[0])
else:
    if not mysite.shutdown(30):
        mysite.stop()

The above will create a base container using a downloaded image, then clone it using an overlayfs based overlay, add apache2 to it, then clone that resulting container into yet another one called “mysite”. So “mysite” is effectively an overlay clone of “web” which is itself an overlay clone of “base”.

 

So there you go, I tried to cover most of the interesting bits of our API with the examples above, though there’s much more available, for example, I didn’t cover the snapshot API (currently restricted to system containers) outside of the specific overlayfs case above and only scratched the surface of what’s possible to do with the attach function.

LXC 1.0 will release with a stable version of the API, we’ll be doing additions in the next few 1.x versions (while doing bugfix only updates to 1.0.x) and hope not to have to break the whole API for quite a while (though we’ll certainly be adding more stuff to it).

Read more
Stéphane Graber

This is post 7 out of 10 in the LXC 1.0 blog post series.

Introduction to unprivileged containers

The support of unprivileged containers is in my opinion one of the most important new features of LXC 1.0.

You may remember from previous posts that I mentioned that LXC should be considered unsafe because while running in a separate namespace, uid 0 in your container is still equal to uid 0 outside of the container, meaning that if you somehow get access to any host resource through proc, sys or some random syscalls, you can potentially escape the container and then you’ll be root on the host.

That’s what user namespaces were designed for and implemented. It was a multi-year effort to think them through and slowly push the hundreds of patches required into the upstream kernel, but finally with 3.12 we got to a point where we can start a full system container entirely as a user.

So how do those user namespaces work? Well, simply put, each user that’s allowed to use them on the system gets assigned a range of unused uids and gids, ideally a whole 65536 of them. You can then use those uids and gids with two standard tools called newuidmap and newgidmap which will let you map any of those uids and gids to virtual uids and gids in a user namespace.

That means you can create a container with the following configuration:

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

The above means that I have one uid map and one gid map defined for my container which will map uids and gids 0 through 65536 in the container to uids and gids 100000 through 165536 on the host.

For this to be allowed, I need to have those ranges assigned to my user at the system level with:

stgraber@castiana:~$ grep stgraber /etc/sub* 2>/dev/null
/etc/subgid:stgraber:100000:65536
/etc/subuid:stgraber:100000:65536

LXC has now been updated so that all the tools are aware of those unprivileged containers. The standard paths also have their unprivileged equivalents:

  • /etc/lxc/lxc.conf => ~/.config/lxc/lxc.conf
  • /etc/lxc/default.conf => ~/.config/lxc/default.conf
  • /var/lib/lxc => ~/.local/share/lxc
  • /var/lib/lxcsnaps => ~/.local/share/lxcsnaps
  • /var/cache/lxc => ~/.cache/lxc

Your user, while it can create new user namespaces in which it’ll be uid 0 and will have some of root’s privileges against resources tied to that namespace will obviously not be granted any extra privilege on the host.

One such thing is creating new network devices on the host or changing bridge configuration. To workaround that, we wrote a tool called “lxc-user-nic” which is the only SETUID binary part of LXC 1.0 and which performs one simple task.
It parses a configuration file and based on its content will create network devices for the user and bridge them. To prevent abuse, you can restrict the number of devices a user can request and to what bridge they may be added.

An example is my own /etc/lxc/lxc-usernet file:

stgraber veth lxcbr0 10

This declares that the user “stgraber” is allowed up to 10 veth type devices to be created and added to the bridge called lxcbr0.

Between what’s offered by the user namespace in the kernel and that setuid tool, we’ve got all that’s needed to run most distributions unprivileged.

Pre-requirements

All examples and instructions I’ll be giving below are expecting that you are running a perfectly up to date version of Ubuntu 14.04 (codename trusty). That’s a pre-release of Ubuntu so you may want to run it in a VM or on a spare machine rather than upgrading your production computer.

The reason to want something that recent is because the rough requirements for well working unprivileged containers are:

  • Kernel: 3.13 + a couple of staging patches (which Ubuntu has in its kernel)
  • User namespaces enabled in the kernel
  • A very recent version of shadow that supports subuid/subgid
  • Per-user cgroups on all controllers (which I turned on a couple of weeks ago)
  • LXC 1.0 beta2 or higher (released two days ago)
  • A version of PAM with a loginuid patch that’s yet to be in any released version

Those requirements happen to all be true of the current development release of Ubuntu as of two days ago.

LXC pre-built containers

User namespaces come with quite a few obvious limitations. For example in a user namespace you won’t be allowed to use mknod to create a block or character device as being allowed to do so would let you access anything on the host. Same thing goes with some filesystems, you won’t for example be allowed to do loop mounts or mount an ext partition, even if you can access the block device.

Those limitations while not necessarily world ending in day to day use are a big problem during the initial bootstrap of a container as tools like debootstrap, yum, … usually try to do some of those restricted actions and will fail pretty badly.

Some templates may be tweaked to work and workaround such as a modified fakeroot could be used to bypass some of those limitations but the goal of the LXC project isn’t to require all of our users to be distro engineers, so we came up with a much simpler solution.

I wrote a new template called “download” which instead of assembling the rootfs and configuration locally will instead contact a server which contains daily pre-built rootfs and configuration for most common templates.

Those images are built from our Jenkins server using a few machines I have on my home network (a set of powerful x86 builders and a quadcore ARM board). The actual build process is pretty straightforward, a basic chroot is assembled, then the current git master is downloaded, built and the standard templates are run with the right release and architecture, the resulting rootfs is compressed, a basic config and metadata (expiry, files to template, …) is saved, the result is pulled by our main server, signed with a dedicated GPG key and published on the public web server.

The client side is a simple template which contacts the server over https (the domain is also DNSSEC enabled and available over IPv6), grabs signed indexes of all the available images, checks if the requested combination of distribution, release and architecture is supported and if it is, grabs the rootfs and metadata tarballs, validates their signature and stores them in a local cache. Any container creation after that point is done using that cache until the time the cache entries expires at which point it’ll grab a new copy from the server.

The current list of images is (as can be requested by passing –list):

---
DIST      RELEASE   ARCH    VARIANT    BUILD
---
debian    wheezy    amd64   default    20140116_22:43
debian    wheezy    armel   default    20140116_22:43
debian    wheezy    armhf   default    20140116_22:43
debian    wheezy    i386    default    20140116_22:43
debian    jessie    amd64   default    20140116_22:43
debian    jessie    armel   default    20140116_22:43
debian    jessie    armhf   default    20140116_22:43
debian    jessie    i386    default    20140116_22:43
debian    sid       amd64   default    20140116_22:43
debian    sid       armel   default    20140116_22:43
debian    sid       armhf   default    20140116_22:43
debian    sid       i386    default    20140116_22:43
oracle    6.5       amd64   default    20140117_11:41
oracle    6.5       i386    default    20140117_11:41
plamo     5.x       amd64   default    20140116_21:37
plamo     5.x       i386    default    20140116_21:37
ubuntu    lucid     amd64   default    20140117_03:50
ubuntu    lucid     i386    default    20140117_03:50
ubuntu    precise   amd64   default    20140117_03:50
ubuntu    precise   armel   default    20140117_03:50
ubuntu    precise   armhf   default    20140117_03:50
ubuntu    precise   i386    default    20140117_03:50
ubuntu    quantal   amd64   default    20140117_03:50
ubuntu    quantal   armel   default    20140117_03:50
ubuntu    quantal   armhf   default    20140117_03:50
ubuntu    quantal   i386    default    20140117_03:50
ubuntu    raring    amd64   default    20140117_03:50
ubuntu    raring    armhf   default    20140117_03:50
ubuntu    raring    i386    default    20140117_03:50
ubuntu    saucy     amd64   default    20140117_03:50
ubuntu    saucy     armhf   default    20140117_03:50
ubuntu    saucy     i386    default    20140117_03:50
ubuntu    trusty    amd64   default    20140117_03:50
ubuntu    trusty    armhf   default    20140117_03:50
ubuntu    trusty    i386    default    20140117_03:50

The template has been carefully written to work on any system that has a POSIX compliant shell with wget. gpg is recommended but can be disabled if your host doesn’t have it (at your own risks).

The same template can be used against your own server, which I hope will be very useful for enterprise deployments to build templates in a central location and have them pulled by all the hosts automatically using our expiry mechanism to keep them fresh.

While the template was designed to workaround limitations of unprivileged containers, it works just as well with system containers, so even on a system that doesn’t support unprivileged containers you can do:

lxc-create -t download -n p1 -- -d ubuntu -r trusty -a amd64

And you’ll get a new container running the latest build of Ubuntu 14.04 amd64.

Using unprivileged LXC

Right, so let’s get you started, as I already mentioned, all the instructions below have only been tested on a very recent Ubuntu 14.04 (trusty) installation.
You may want to grab a daily build and run it in a VM.

Install the required packages:

  • sudo apt-get update
  • sudo apt-get dist-upgrade
  • sudo apt-get install lxc systemd-services uidmap

Then, assign yourself a set of uids and gids with:

  • sudo usermod –add-subuids 100000-165536 $USER
  • sudo usermod –add-subgids 100000-165536 $USER
  • sudo chmod +x $HOME

That last one is required because LXC needs it to access ~/.local/share/lxc/ after it switched to the mapped UIDs. If you’re using ACLs, you may instead use “u:100000:x” as a more specific ACL.

Now create ~/.config/lxc/default.conf with the following content:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

And /etc/lxc/lxc-usernet with:

<your username> veth lxcbr0 10

And that’s all you need. Now let’s create our first unprivileged container with:

lxc-create -t download -n p1 -- -d ubuntu -r trusty -a amd64

You should see the following output from the download template:

Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created an Ubuntu container (release=trusty, arch=amd64).
The default username/password is: ubuntu / ubuntu
To gain root privileges, please use sudo.

So looks like your first container was created successfully, now let’s see if it starts:

ubuntu@trusty-daily:~$ lxc-start -n p1 -d
ubuntu@trusty-daily:~$ lxc-ls --fancy
NAME  STATE    IPV4     IPV6     AUTOSTART  
------------------------------------------
p1    RUNNING  UNKNOWN  UNKNOWN  NO

It’s running! At this point, you can get a console using lxc-console or can SSH to it by looking for its IP in the ARP table (arp -n).

One thing you probably noticed above is that the IP addresses for the container aren’t listed, that’s because unfortunately LXC currently can’t attach to an unprivileged container’s namespaces. That also means that some fields of lxc-info will be empty and that you can’t use lxc-attach. However we’re looking into ways to get that sorted in the near future.

There are also a few problems with job control in the kernel and with PAM, so doing a non-detached lxc-start will probably result in a rather weird console where things like sudo will most likely fail. SSH may also fail on some distros. A patch has been sent upstream for this, but I just noticed that it doesn’t actually cover all cases and even if it did, it’s not in any released version yet.

Quite a few more improvements to unprivileged containers are to come until the final 1.0 release next month and while we certainly don’t expect all workloads to be possible with unprivileged containers, it’s still a huge improvement on what we had before and a very good building block for a lot more interesting use cases.

Read more
Stéphane Graber

This is post 6 out of 10 in the LXC 1.0 blog post series.

When talking about container security most people either consider containers as inherently insecure or inherently secure. The reality isn’t so black and white and LXC supports a variety of technologies to mitigate most security concerns.

One thing to clarify right from the start is that you won’t hear any of the LXC maintainers tell you that LXC is secure so long as you use privileged containers. However, at least in Ubuntu, our default containers ship with what we think is a pretty good configuration of both the cgroup access and an extensive apparmor profile which prevents all attacks that we are aware of.

Below I’ll be covering the various technologies LXC supports to let you restrict what a container may do. Just keep in mind that unless you are using unprivileged containers, you shouldn’t give root access to a container to someone whom you’d mind having root access to your host.

Capabilities

The first security feature which was added to LXC was Linux capabilities support. With that feature you can set a list of capabilities that you want LXC to drop before starting the container or a full list of capabilities to retain (all others will be dropped).

The two relevant configurations options are:

  • lxc.cap.drop
  • lxc.cap.keep

Both are lists of capability names as listed in capabilities(7).

This may sound like a great way to make containers safe and for very specific cases it may be, however if running a system container, you’ll soon notice that dropping sys_admin and net_admin isn’t very practical and short of dropping those, you won’t make your container much safer (as root in the container will be able to re-grant itself any dropped capability).

In Ubuntu we use lxc.cap.drop to drop sys_module, mac_admin, mac_override, sys_time which prevent some known problems at container boot time.

Control groups

Control groups are interesting because they achieve multiple things which while interconnected are still pretty different:

  • Resource bean counting
  • Resource quotas
  • Access restrictions

The first two aren’t really security related, though resource quotas will let you avoid some obvious DoS of the host (by setting memory, cpu and I/O limits).

The last is mostly about the devices cgroup which lets you define which character and block devices a container may access and what it can do with them (you can restrict creation, read access and write access for each major/minor combination).

In LXC, configuring cgroups is done with the “lxc.cgroup.*” options which can roughly be defined as: lxc.cgroup.<controller>.<key> = <value>

For example to set a memory limit on p1 you’d add the following to its configuration:

lxc.cgroup.memory.limit_in_bytes = 134217728

This will set a memory limit of 128MB (the value is in bytes) and will be the equivalent to writing that same value to /sys/fs/cgroup/memory/lxc/p1/memory.limit_in_bytes

Most LXC templates only set a few devices controller entries by default:

# Default cgroup limits
lxc.cgroup.devices.deny = a
## Allow any mknod (but not using the node)
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
## /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
## consoles
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 5:1 rwm
## /dev/{,u}random
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 1:9 rwm
## /dev/pts/*
lxc.cgroup.devices.allow = c 5:2 rwm
lxc.cgroup.devices.allow = c 136:* rwm
## rtc
lxc.cgroup.devices.allow = c 254:0 rm
## fuse
lxc.cgroup.devices.allow = c 10:229 rwm
## tun
lxc.cgroup.devices.allow = c 10:200 rwm
## full
lxc.cgroup.devices.allow = c 1:7 rwm
## hpet
lxc.cgroup.devices.allow = c 10:228 rwm
## kvm
lxc.cgroup.devices.allow = c 10:232 rwm

This configuration allows the container (usually udev) to create any device it wishes (that’s the wildcard “m” above) but block everything else (the “a” deny entry) unless it’s listed in one of the allow entries below. This covers everything a container will typically need to function.

You will find reasonably up to date documentation about the available controllers, control files and supported values at:
https://www.kernel.org/doc/Documentation/cgroups/

Apparmor

A little while back we added Apparmor profiles support to LXC.
The Apparmor support is rather simple, there’s one configuration option “lxc.aa_profile” which sets what apparmor profile to use for the container.

LXC will then setup the container and ask apparmor to switch it to that profile right before starting the container. Ubuntu’s LXC profile is rather complex as it aims to prevent any of the known ways of escaping a container or cause harm to the host.

As things are today, Ubuntu ships with 3 apparmor profiles meaning that the supported values for lxc.aa_profile are:

  • lxc-container-default (default value if lxc.aa_profile isn’t set)
  • lxc-container-default-with-nesting (same as default but allows some needed bits for nested containers)
  • lxc-container-default-with-mounting (same as default but allows mounting ext*, xfs and btrfs file systems).
  • unconfined (a special value which will disable apparmor support for the container)

You can also define your own by copying one of the ones in /etc/apparmor.d/lxc/, adding the bits you want, giving it a unique name, then reloading apparmor with “sudo /etc/init.d/apparmor reload” and finally setting lxc.aa_profile to the new profile’s name.

SELinux

The SELinux support is very similar to Apparmor’s. An SELinux context can be set using “lxc.se_context”.

An example would be:

lxc.se_context = unconfined_u:unconfined_r:lxc_t:s0-s0:c0.c1023

Similarly to Apparmor, LXC will switch to the new SELinux context right before starting init in the container. As far as I know, no distributions are setting a default SELinux context at this time, however most distributions build LXC with SELinux support (including Ubuntu, should someone choose to boot their host with SELinux rather than Apparmor).

Seccomp

Seccomp is a fairly recent kernel mechanism which allows for filtering of system calls.
As a user you can write a seccomp policy file and set it using “lxc.seccomp” in the container’s configuration. As always, this policy will only be applied to the running container and will allow or reject syscalls with a pre-defined return value.

An example (though limited and useless) of a seccomp policy file would be:

1
whitelist
103

Which would only allow syscall #103 (syslog) in the container and reject everything else.

Note that seccomp is a rather low level feature and only useful for some very specific use cases. All syscalls have to be referred by their ID instead of their name and those may change between architectures. Also, as things are today, if your host is 64bit and you load a seccomp policy file, all 32bit syscalls will be rejected. We’d need per-personality seccomp profiles to solve that but it’s not been a high priority so far.

User namespace

And last but not least, what’s probably the only way of making a container actually safe. LXC now has support for user namespaces. I’ll go into more details on how to use that feature in a later blog post but simply put, LXC is no longer running as root so even if an attacker manages to escape the container, he’d find himself having the privileges of a regular user on the host.

All this is achieved by assigning ranges of uids and gids to existing users. Those users on the host will then be allowed to clone a new user namespace in which all uids/gids are mapped to uids/gids that are part of the user’s range.

This obviously means that you need to allocate a rather silly amount of uids and gids to each user who’ll be using LXC in that way. In a perfect world, you’d allocate 65536 uids and gids per container and per user. As this would likely exhaust the whole uid/gid range rather quickly on some systems, I tend to go with “just” 65536 uids and gids per user that’ll use LXC and then have the same range shared by all containers.

Anyway, that’s enough details about user namespaces for now. I’ll cover how to actually set that up and use those unprivileged containers in the next post.

Read more
Stéphane Graber

This is post 5 out of 10 in the LXC 1.0 blog post series.

Storage backingstores

LXC supports a variety of storage backends (also referred to as backingstore).
It defaults to “none” which simply stores the rootfs under
/var/lib/lxc/<container>/rootfs but you can specify something else to lxc-create or lxc-clone with the -B option.

Currently supported values are:

directory based storage (“none” and “dir)

This is the default backingstore, the container rootfs is stored under
/var/lib/lxc/<container>/rootfs

The --dir option (when using “dir”) can be used to override the path.

btrfs

With this backingstore LXC will setup a new subvolume for the container which makes snapshotting much easier.

lvm

This one will use a new logical volume for the container.
The LV can be set with --lvname (the default is the container name).
The VG can be set with --vgname (the default is “lxc”).
The filesystem can be set with --fstype (the default is “ext4″).
The size can be set with --fssize (the default is “1G”).
You can also use LVM thinpools with --thinpool

overlayfs

This one is mostly used when cloning containers to create a container based on another one and storing any changes in an overlay.

When used with lxc-create it’ll create a container where any change done after its initial creation will be stored in a “delta0″ directory next to the container’s rootfs.

zfs

Very similar to btrfs, as I’ve not used either of those myself I can’t say much about them besides that it should also create some kind of subvolume for the container and make snapshots and clones faster and more space efficient.

Standard paths

One quick word with the way LXC usually works and where it’s storing its files:

  • /var/lib/lxc (default location for containers)
  • /var/lib/lxcsnap (default location for snapshots)
  • /var/cache/lxc (default location for the template cache)
  • $HOME/.local/share/lxc (default location for unprivileged containers)
  • $HOME/.local/share/lxcsnap (default location for unprivileged snapshots)
  • $HOME/.cache/lxc (default location for unprivileged template cache)

The default path, also called lxcpath can be overridden on the command line with the -P option or once and for all by setting “lxcpath = /new/path” in /etc/lxc/lxc.conf (or $HOME/.config/lxc/lxc.conf for unprivileged containers).

The snapshot directory is always “snap” appended to lxcpath so it’ll magically follow lxcpath. The template cache is unfortunately hardcoded and can’t easily be moved short of relying on bind-mounts or symlinks.

The default configuration used for all containers at creation time is taken from
/etc/lxc/default.conf (no unprivileged equivalent yet).
The templates themselves are stored in /usr/share/lxc/templates.

Cloning containers

All those backingstores only really shine once you start cloning containers.
For example, let’s take our good old “p1″ Ubuntu container and let’s say you want to make a usable copy of it called “p4″, you can simply do:

sudo lxc-clone -o p1 -n p4

And there you go, you’ve got a working “p4″ container that’ll be a simple copy of “p1″ but with a new mac address and its hostname properly set.

Now let’s say you want to do a quick test against “p1″ but don’t want to alter that container itself, yet you don’t want to wait the time needed for a full copy, you can simply do:

sudo lxc-clone -o p1 -n p1-test -B overlayfs -s

And there you go, you’ve got a new “p1-test” container which is entirely based on the “p1″ rootfs and where any change will be stored in the “delta0″ directory of “p1-test”.
The same “-s” option also works with lvm and btrfs (possibly zfs too) containers and tells lxc-clone to use a snapshot rather than copy the whole rootfs across.

Snapshotting

So cloning is nice and convenient, great for things like development environments where you want throw away containers. But in production, snapshots tend to be a whole lot more useful for things like backup or just before you do possibly risky changes.

In LXC we have a “lxc-snapshot” tool which will let you create, list, restore and destroy snapshots of your containers.
Before I show you how it works, please note that “lxc-snapshot” currently doesn’t appear to work with directory based containers. With those it produces an empty snapshot, this should be fixed by the time LXC 1.0 is actually released.

So, let’s say we want to backup our “p1-lvm” container before installing “apache2″ into it, simply run:

echo "before installing apache2" > snap-comment
sudo lxc-snapshot -n p1-lvm -c snap-comment

At which point, you can confirm the snapshot was created with:

sudo lxc-snapshot -n p1-lvm -L -C

Now you can go ahead and install “apache2″ in the container.

If you want to revert the container at a later point, simply use:

sudo lxc-snapshot -n p1-lvm -r snap0

Or if you want to restore a snapshot as its own container, you can use:

sudo lxc-snapshot -n p1-lvm -r snap0 p1-lvm-snap0

And you’ll get a new “p1-lvm-snap0″ container which will contain a working copy of “p1-lvm” as it was at “snap0″.

Read more
Stéphane Graber

This is post 4 out of 10 in the LXC 1.0 blog post series.

Running foreign architectures

By default LXC will only let you run containers of one of the architectures supported by the host. That makes sense since after all, your CPU doesn’t know what to do with anything else.

Except that we have this convenient package called “qemu-user-static” which contains a whole bunch of emulators for quite a few interesting architectures. The most common and useful of those is qemu-arm-static which will let you run most armv7 binaries directly on x86.

The “ubuntu” template knows how to make use of qemu-user-static, so you can simply check that you have the “qemu-user-static” package installed, then run:

sudo lxc-create -t ubuntu -n p3 -- -a armhf

After a rather long bootstrap, you’ll get a new p3 container which will be mostly running Ubuntu armhf. I’m saying mostly because the qemu emulation comes with a few limitations, the biggest of which is that any piece of software using the ptrace() syscall will fail and so will anything using netlink. As a result, LXC will install the host architecture version of upstart and a few of the networking tools so that the containers can boot properly.

stgraber@castiana:~$ file /bin/ls
/bin/ls: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, ""BuildID[sha1]"" =e50e0a5dadb8a7f4eaa2fd715cacb9842e157dc7, stripped
stgraber@castiana:~$ sudo lxc-start -n p3 -d
stgraber@castiana:~$ sudo lxc-attach -n p3
root@p3:/# file /bin/ls
/bin/ls: ELF 32-bit LSB  executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, ""BuildID[sha1]"" =88ff013a8fd9389747fb1fea1c898547fb0f650a, stripped
root@p3:/# exit
stgraber@castiana:~$ sudo lxc-stop -n p3
stgraber@castiana:~$

Hooks

As we know people like to script their containers and that our configuration can’t always accommodate every single use case, we’ve introduced a set of hooks which you may use.

Those hooks are simple paths to an executable file which LXC will run at some specific time in the lifetime of the container. Those executables will also be passed a set of useful environment variables so they can easily know what container invoked them and what to do.

The currently available hooks are (details in lxc.conf(5)):

  • lxc.hook.pre-start (called before any initialization is done)
  • lxc.hook.pre-mount (called after creating the mount namespace but before mounting anything)
  • lxc.hook.mount (called after the mounts but before pivot_root)
  • lxc.hook.autodev (identical to mount but only called if using autodev)
  • lxc.hook.start (called in the container right before /sbin/init)
  • lxc.hook.post-stop (run after the container has been shutdown)
  • lxc.hook.clone (called when cloning a container into a new one)

Additionally each network section may also define two additional hooks:

  • lxc.network.script.up (called in the network namespace after the interface was created)
  • lxc.network.script.down (called in the network namespace before destroying the interface)

All of those hooks may be specified as many times as you want in the configuration so you can use each hooking point multiple times.

As a simple example, let’s add the following to our “p1″ container:

lxc.hook.pre-start = /var/lib/lxc/p1/pre-start.sh

And create the hook itself at /var/lib/lxc/p1/pre-start.sh:

#!/bin/sh
echo "arguments: $*" > /tmp/test
echo "environment:" >> /tmp/test
env | grep LXC >> /tmp/test

Make it executable (chmod 755) and then start the container.
Checking /tmp/test you should see:

arguments: p1 lxc pre-start
environment:
LXC_ROOTFS_MOUNT=/usr/lib/x86_64-linux-gnu/lxc
LXC_CONFIG_FILE=/var/lib/lxc/p1/config
LXC_ROOTFS_PATH=/var/lib/lxc/p1/rootfs
LXC_NAME=p1

Android containers

I’ve often been asked whether it was possible to run Android in an LXC container. Well, the short answer is yes. However it’s not very simple and it really depends on what you want to do with it.

The first thing you’ll need if you want to do this is get your machine to run an Android kernel, you’ll need to have any modules needed by Android built and loaded before you can start the container.

Once you have that, you’ll need to create a new container by hand.
Let’s put it in “/var/lib/lxc/android/”, in there, you need a configuration file similar to this one:

lxc.rootfs = /var/lib/lxc/android/rootfs
lxc.utsname = armhf

lxc.network.type = none

lxc.devttydir = lxc
lxc.tty = 4
lxc.pts = 1024
lxc.arch = armhf
lxc.cap.drop = mac_admin mac_override
lxc.pivotdir = lxc_putold

lxc.hook.pre-start = /var/lib/lxc/android/pre-start.sh

lxc.aa_profile = unconfined

/var/lib/lxc/android/pre-start.sh is where the interesting bits happen. It needs to be an executable shell script, containing something along the lines of:

#!/bin/sh
mkdir -p $LXC_ROOTFS_PATH
mount -n -t tmpfs tmpfs $LXC_ROOTFS_PATH

cd $LXC_ROOTFS_PATH
cat /var/lib/lxc/android/initrd.gz | gzip -d | cpio -i

# Create /dev/pts if missing
mkdir -p $LXC_ROOTFS_PATH/dev/pts

Then get the initrd for your device and place it in /var/lib/lxc/android/initrd.gz.

At that point, when starting the LXC container, the Android initrd will be unpacked on a tmpfs (similar to Android’s ramfs) and Android’s init will be started which in turn should mount any partition that Android requires and then start all of the usual services.

Because there are no apparmor, cgroup or even network configuration applied to it, the container will have a lot of rights and will typically completely crash the machine. You unfortunately have to be familiar with the way Android works and not be afraid to modify its init scripts if not even its init process to only start the bits you actually want.

I can’t provide a generic recipe there as it completely depends on what you’re interested on, what version of Android and what device you’re using. But it’s clearly possible to do and you may want to look at Ubuntu Touch to see how we’re doing it by default there.

One last note, Android’s init script isn’t in /sbin/init, so you need to tell LXC where to load it with:

lxc-start -n android -- /init

LXC on Android devices

So now that we’ve seen how to run Android in LXC, let’s talk about running Ubuntu on Android in LXC.

LXC has been ported to bionic (Android’s C library) and while not feature-equivalent with its glibc build, it’s still good enough to be used.

Unfortunately due to the kind of low level access LXC requires and the fact that our primary focus isn’t Android, installation could be easier…You won’t be finding LXC on the Google PlayStore and we won’t provide you with a .apk that you can install.

Instead every time something changes in the upstream git branch, we produce a new tarball which can be downloaded here: http://qa.linuxcontainers.org/master/current/android-armel/lxc-android.tar.gz

This build is known to work with Android >= 4.2 but will quite likely work on older versions too.

For this to work, you’ll need to grab your device’s kernel configuration and run lxc-checkconfig against it to see whether it’s compatible with LXC or not. Unfortunately it’s very likely that it won’t be… In that case, you’ll need to go hunt for the kernel source for your device, add the missing feature flags, rebuild it and update your device to boot your updated kernel.

As scary as this may sound, it’s usually not that difficult as long as your device is unlocked and you’re already using an alternate ROM like Cyanogen which usually make their kernel git tree easily available.

Once your device has a working kernel, all you need to do is unpack our tarball as root in your device’s / directory, copy an arm container to /data/lxc/containers/<container name>, get into /data/lxc and run “./run-lxc lxc-start -n <container name>”.
A few seconds later you’ll be greeted by a login prompt.

Read more
Stéphane Graber

This is post 3 out of 10 in the LXC 1.0 blog post series.

Exchanging data with a container

Because containers directly share their filesystem with the host, there’s a lot of things that can be done to pass data into a container or to get stuff out.

The first obvious one is that you can access the container’s root at:
/var/lib/lxc/<container name>/rootfs/

That’s great, but sometimes you need to access data that’s in the container and on a filesystem which was mounted by the container itself (such as a tmpfs). In those cases, you can use this trick:

sudo ls -lh /proc/$(sudo lxc-info -n p1 -p -H)/root/run/

Which will show you what’s in /run of the running container “p1″.

Now, that’s great to have access from the host to the container, but what about having the container access and write data to the host?
Well, let’s say we want to have our host’s /var/cache/lxc shared with “p1″, we can edit /var/lib/lxc/p1/fstab and append:

/var/cache/lxc var/cache/lxc none bind,create=dir

This line means, mount “/var/cache/lxc” from the host as “/var/cache/lxc” (the lack of initial / makes it relative to the container’s root), mount it as a bind-mount (“none” fstype and “bind” option) and create any directory that’s missing in the container (“create=dir”).

Now restart “p1″ and you’ll see /var/cache/lxc in there, showing the same thing as you have on the host. Note that if you want the container to only be able to read the data, you can simply add “ro” as a mount flag in the fstab.

Container nesting

One pretty cool feature of LXC (though admittedly not very useful to most people) is support for nesting. That is, you can run LXC within LXC with pretty much no overhead.

By default this is blocked in Ubuntu as allowing this at the moment requires letting the container mount cgroupfs which will let it escape any cgroup restrictions that’s applied to it. It’s not an issue in most environment, but if you don’t trust your containers at all, then you shouldn’t be using nesting at this point.

So to enable nesting for our “p1″ container, edit /var/lib/lxc/p1/config and add:

lxc.aa_profile = lxc-container-default-with-nesting

And then restart “p1″. Once that’s done, install lxc inside the container. I usually recommend using the same version as the host, though that’s not strictly required.

Once LXC is installed in the container, run:

sudo lxc-create -t ubuntu -n p1

As you’ve previously bind-mounted /var/cache/lxc inside the container, this should be very quick (it shouldn’t rebootstrap the whole environment). Then start that new container as usual.

At that point, you may now run lxc-ls on the host in nesting mode to see exactly what’s running on your system:

stgraber@castiana:~$ sudo lxc-ls --fancy --nesting
NAME    STATE    IPV4                 IPV6   AUTOSTART  
------------------------------------------------------
p1      RUNNING  10.0.3.82, 10.0.4.1  -      NO       
 \_ p1  RUNNING  10.0.4.7             -      NO       
p2      RUNNING  10.0.3.128           -      NO

There’s no real limit to the number of level you can go, though as fun as it may be, it’s hard to imagine why 10 levels of nesting would be of much use to anyone :)

Raw network access

In the previous post I mentioned passing raw devices from the host inside the container. One such container I use relatively often is when working with a remote network over a VPN. That network uses OpenVPN and a raw ethernet tap device.

I needed to have a completely isolated system access that VPN so I wouldn’t get mixed routes and it’d appear just like any other machine to the machines on the remote site.

All I had to do to make this work was set my container’s network configuration to:

lxc.network.type = phys
lxc.network.hwaddr = 00:16:3e:c6:0e:04
lxc.network.flags = up
lxc.network.link = tap0
lxc.network.name = eth0

Then all I have to do is start OpenVPN on my host which will connect and setup tap0, then start the container which will steal that interface and use it as its own eth0.The container will then use DHCP to grab an IP and will behave just like if it was a physical machine connect directly in the remote network.

Read more
Stéphane Graber

This is post 2 out of 10 in the LXC 1.0 blog post series.

More templates

So at this point you should have a working Ubuntu container that’s called “p1″ and was created using the default template called simply enough “ubuntu”.

But LXC supports much more than just standard Ubuntu. In fact, in current upstream git (and daily PPA), we support Alpine Linux, Alt Linux, Arch Linux, busybox, CentOS, Cirros, Debian, Fedora, OpenMandriva, OpenSUSE, Oracle, Plamo, sshd, Ubuntu Cloud and Ubuntu.

All of those can usually be found in /usr/share/lxc/templates. They also all typically have extra advanced options which you can get to by passing “--help” after the “lxc-create” call (the “--” is required to split “lxc-create” options from the template’s).

Writing extra templates isn’t too difficult, they basically are executables (all shell scripts but that’s not a requirement) which take a set of standard arguments and are expected to produce a working rootfs in the path that’s passed to them.

One thing to be aware of is that due to missing tools not all distros can be bootstrapped on all distros. It’s usually best to just try. We’re always interested in making those work on more distros even if that means using some rather weird tricks (like is done in the fedora template) so if you have a specific combination which doesn’t work at the moment, patches are definitely welcome!

Anyway, enough talking for now, let’s go ahead and create an Oracle Linux container that we’ll force to be 32bit.

sudo lxc-create -t oracle -n p2 -- -a i386

On most systems, this will initially fail, telling you to install the “rpm” package first which is needed for bootstrap reasons. So install it and “yum” and then try again.

After some time downloading RPMs, the container will be created, then it’s just a:

sudo lxc-start -n p2

And you’ll be greated by the Oracle Linux login prompt (root / root).

At that point since you started the container without passing “-d” to “lxc-start”, you’ll have to shut it down to get your shell back (you can’t detach from a container which wasn’t started initially in the background).

Now if you are wondering why Ubuntu has two templates. The Ubuntu template which I’ve been using so far does a local bootstrap using “debootstrap” basically building your container from scratch, whereas the Ubuntu Cloud template (ubuntu-cloud) downloads a pre-generated cloud image (identical to what you’d get on EC2 or other cloud services) and starts it. That image also includes cloud-init and supports the standard cloud metadata.

It’s a matter of personal choice which you like best. I personally have a local mirror so the “ubuntu” template is much faster for me and I also trust it more since I know everything was downloaded from the archive in front of me and assembled locally on my machine.

One last note on templates. Most of them use a local cache, so the initial bootstrap of a container for a given arch will be slow, any subsequent one will just be a local copy from the cache and will be much faster.

Auto-start

So what if you want to start a container automatically at boot time?

Well, that’s been supported for a long time in Ubuntu and other distros by using some init scripts and symlinks in /etc, but very recently (two days ago), this has now been implemented cleanly upstream.

So here’s how auto-started containers work nowadays:

As you may know, each container has a configuration file typically under
/var/lib/lxc/<container name>/config

That file is key = value with the list of valid keys being specified in lxc.conf(5).

The startup related values that are available are:

  • lxc.start.auto = 0 (disabled) or 1 (enabled)
  • lxc.start.delay = 0 (delay in second to wait after starting the container)
  • lxc.start.order = 0 (priority of the container, higher value means starts earlier)
  • lxc.group = group1,group2,group3,… (groups the container is a member of)

When your machine starts, an init script will ask “lxc-autostart” to start all containers of a given group (by default, all containers which aren’t in any) in the right order and waiting the specified time between them.

To illustrate that, edit /var/lib/lxc/p1/config and append those lines to the file:

lxc.start.auto = 1
lxc.group = ubuntu

And /var/lib/lxc/p2/config and append those lines:

lxc.start.auto = 1
lxc.start.delay = 5
lxc.start.order = 100

Doing that means that only the p2 container will be started at boot time (since only those without a group are by default), the order value won’t matter since it’s alone and the init script will wait 5s before moving on.

You may check what containers are automatically started using “lxc-ls”:

stgraber@castiana:~$ sudo lxc-ls --fancy
NAME    STATE    IPV4        IPV6                                    AUTOSTART     
---------------------------------------------------------------------------------
p1      RUNNING  10.0.3.128  2607:f2c0:f00f:2751:216:3eff:feb1:4c7f  YES (ubuntu)
p2      RUNNING  10.0.3.165  2607:f2c0:f00f:2751:216:3eff:fe3a:f1c1  YES

Now you can also manually play with those containers using the “lxc-autostart” command which let’s you start/stop/kill/reboot any container marked with lxc.start.auto=1.

For example, you could do:

sudo lxc-autostart -a

Which will start any container that has lxc.start.auto=1 (ignoring the lxc.group value) which in our case means it’ll first start p2 (because of order = 100), then wait 5s (because of delay = 5) and then start p1 and return immediately afterwards.

If at that point you want to reboot all containers that are in the “ubuntu” group, you may do:

sudo lxc-autostart -r -g ubuntu

You can also pass “-L” with any of those commands which will simply print which containers would be affected and what the delays would be but won’t actually do anything (useful to integrate with other scripts).

Freezing your containers

Sometimes containers may be running daemons that take time to shutdown or restart, yet you don’t want to run the container because you’re not actively using it at the time.

In such cases, “sudo lxc-freeze -n <container name>” can be used. That very simply freezes all the processes in the container so they won’t get any time allocated by the scheduler. However the processes will still exist and will still use whatever memory they used to.

Once you need the service again, just call “sudo lxc-unfreeze -n <container name>” and all the processes will be restarted.

Networking

As you may have noticed in the configuration file while you were setting the auto-start settings, LXC has a relatively flexible network configuration.
By default in Ubuntu we allocate one “veth” device per container which is bridged into a “lxcbr0″ bridge on the host on which we run a minimal dnsmasq dhcp server.

While that’s usually good enough for most people. You may want something slightly more complex, such as multiple network interfaces in the container or passing through physical network interfaces, … The details of all of those options are listed in lxc.conf(5) so I won’t repeat them here, however here’s a quick example of what can be done.

lxc.network.type = veth
lxc.network.hwaddr = 00:16:3e:3a:f1:c1
lxc.network.flags = up
lxc.network.link = lxcbr0
lxc.network.name = eth0

lxc.network.type = veth
lxc.network.link = virbr0
lxc.network.name = virt0

lxc.network.type = phys
lxc.network.link = eth2
lxc.network.name = eth1

With this setup my container will have 3 interfaces, eth0 will be the usual veth device in the lxcbr0 bridge, eth1 will be the host’s eth2 moved inside the container (it’ll disappear from the host while the container is running) and virt0 will be another veth device in the virbr0 bridge on the host.

Those last two interfaces don’t have a mac address or network flags set, so they’ll get a random mac address at boot time (non-persistent) and it’ll be up to the container to bring the link up.

Attach

Provided you are running a sufficiently recent kernel, that is 3.8 or higher, you may use the “lxc-attach” tool. It’s most basic feature is to give you a standard shell inside a running container:

sudo lxc-attach -n p1

You may also use it from scripts to run actions in the container, such as:

sudo lxc-attach -n p1 -- restart ssh

But it’s a lot more powerful than that. For example, take:

sudo lxc-attach -n p1 -e -s 'NETWORK|UTSNAME'

In that case, you’ll get a shell that says “root@p1″ (thanks to UTSNAME), running “ifconfig -a” from there will list the container’s network interfaces. But everything else will be that of the host. Also passing “-e” means that the cgroup, apparmor, … restrictions won’t apply to any processes started from that shell.

This can be very useful at times to spawn a software located on the host but inside the container’s network or pid namespace.

Passing devices to a running container

It’s great being able to enter and leave the container at will, but what about accessing some random devices on your host?

By default LXC will prevent any such access using the devices cgroup as a filtering mechanism. You could edit the container configuration to allow the right additional devices and then restart the container.

But for one-off things, there’s also a very convenient tool called “lxc-device”.
With it, you can simply do:

sudo lxc-device add -n p1 /dev/ttyUSB0 /dev/ttyS0

Which will add (mknod) /dev/ttyS0 in the container with the same type/major/minor as /dev/ttyUSB0 and then add the matching cgroup entry allowing access from the container.

The same tool also allows moving network devices from the host to within the container.

Read more
Stéphane Graber

This is post 1 out of 10 in the LXC 1.0 blog post series.

So what’s LXC?

Most of you probably already know the answer to that one, but here it goes:

“LXC is a userspace interface for the Linux kernel containment features.
Through a powerful API and simple tools, it lets Linux users easily create and manage system or application containers.”

I’m one of the two upstream maintainers of LXC along with Serge Hallyn.
The project is quite actively developed with milestones every month and a stable release coming up in February. It’s so far been developed by 67 contributors from a wide range of backgrounds and companies.

The project is mostly developed on github: http://github.com/lxc
We have a website at: http://linuxcontainers.org
And mailing lists at: http://lists.linuxcontainers.org

LXC 1.0

So what’s that 1.0 release all about?

Well, simply put it’s going to be the first real stable release of LXC and the first we’ll be supporting for 5 years with bugfix releases. It’s also the one which will be included in Ubuntu 14.04 LTS to be released in April 2014.

It’s also going to come with a stable API and a set of bindings, quite a few interesting new features which will be detailed in the next few posts and support for a wide range of host and guest distributions (including Android).

How to get it?

I’m assuming most of you will be using Ubuntu. For the next few posts, I’ll myself be using the current upstream daily builds on Ubuntu 14.04 but we maintain daily builds on 12.04, 12.10, 13.04, 13.10 and 14.04, so if you want the latest upstream code, you can use our PPA.

Alternatively, LXC is also directly in Ubuntu and quite usable since Ubuntu 12.04 LTS. You can choose to use the version which comes with whatever release you are on, or you can use one the backported version we maintain.

If you want to build it yourself, you can do (not recommended when you can simply use the packages for your distribution):

git clone git://github.com/lxc/lxc
cd lxc
sh autogen.sh
# You will probably want to run the configure script with --help and then set the paths
./configure
make
sudo make install

What about that first container?

Oh right, that was actually the goal of this post wasn’t it?

Ok, so now that you have LXC installed, hopefully using the Ubuntu packages, it’s really as simple as:

# Create a "p1" container using the "ubuntu" template and the same version of Ubuntu
# and architecture as the host. Pass "-- --help" to list all available options.
sudo lxc-create -t ubuntu -n p1

# Start the container (in the background)
sudo lxc-start -n p1 -d

# Enter the container in one of those ways## Attach to the container's console (ctrl-a + q to detach)
sudo lxc-console -n p1

## Spawn bash directly in the container (bypassing the console login), requires a >= 3.8 kernel
sudo lxc-attach -n p1

## SSH into it
sudo lxc-info -n p1
ssh ubuntu@<ip from lxc-info>

# Stop the container in one of those ways
## Stop it from within
sudo poweroff

## Stop it cleanly from the outside
sudo lxc-stop -n p1

## Kill it from the outside
sudo lxc-stop -n p1 -k

And there you go, that’s your first container. You’ll note that everything usually just works on Ubuntu. Our kernels have support for all the features that LXC may use and our packages setup a bridge and a DHCP server that the containers will use by default.
All of that is obviously configurable and will be covered in the coming posts.

Read more
Stéphane Graber

So it’s almost the end of the year, I’ve got about 10 days of vacation for the holidays and a bit of time on my hands.

Since I’ve been doing quite a bit of work on LXC lately in prevision for the LXC 1.0 release early next year, I thought that it’d be a good use of some of that extra time to blog about the current state of LXC.

As a result, I’m preparing a series of 10 blog posts covering what I think are some of the most exciting features of LXC. The planned structure is:

While they are all titled LXC 1.0, most of the things I’ll be showing will work just as well on older LXC. However some of the features will need a very very recent version of LXC (as in, current upstream git). I’ll try to make that clear and will explain how to use our stable backports in Ubuntu or current upstream snapshots from our PPA.

I’ll be updating this first blog post with links to all of the posts in the series. So if you want to bookmark or refer to these, please use this post.

Read more
Stéphane Graber

Anyone who met me probably knows that I like to run everything in containers.

A couple of weeks ago, I was attending the Ubuntu Developer Summit in Copenhagen, DK where I demoed how to run OpenGL code from within an LXC container. At that same UDS, all attendees also received a beta key for Steam on Linux.

Yesterday I finally received said key by e-mail and I’ve been experimenting with Steam a bit. Now, my laptop is running the development version of Ubuntu 13.04 and only has 64bit binaries. Steam is 32bit-only and Valve recommends running it on Ubuntu 12.04 LTS.

So I just spent a couple of hours writing a tool called steam-lxc which uses LXC’s new python API and a bunch more python magic to generate an Ubuntu 12.04 LTS 32bit container, install everything that’s needed to run Steam, then install Steam itself and configures some tricks to get direct GPU access and access to pulseaudio for sound.

All in all, it only takes 3 minutes for the script to setup everything I need to run Steam and then start it.

Here’s a (pretty boring) screencast of the script in action:

This script has only been tested with Intel hardware on Ubuntu 13.04 64bit at this point, but the PPA contains builds for Ubuntu 12.04 and Ubuntu 12.10 too.

To get it on your machine just do:

  • sudo apt-add-repository ppa:ubuntu-lxc/stable
  • sudo apt-get update
  • sudo apt-get install steam-lxc
  • sudo mkdir -p /var/lib/lxc /var/cache/lxc

Then once that’s all installed, set it up with sudo steam-lxc create. This can take somewhere from 5 minutes to an hour depending on your internet connection.

And once the environment is all setup, you can start steam with sudo steam-lxc run.

The code can be found at: https://code.launchpad.net/~ubuntu-lxc/lxc/steam-lxc

You can leave your feedback as comment here and if you want to improve the script, merge proposals are more than welcome.
I don’t have any hardware requiring proprietary drivers but I’d expect steam to fail on such hardware as the drivers won’t get properly installed in the container. Adding code to deal with those is pretty easy and I’d love to get some patches for that!

Have fun!

Read more
Stéphane Graber

(tl;dr: Edubuntu 14.04 will include a new Edubuntu Server and Edubuntu tablet edition with a lot of cool new features including a full feature Active Directory compatible domain.)

Now that Edubuntu 12.10 is out the door and the Ubuntu Developer Summit in Copenhagen is just a week away, I thought it’d be an appropriate time to share our vision for Edubuntu 14.04.

This was so far only discussed in person with Jonathan Carter and a bit on IRC with other Edubuntu developers but I think it’s time to make our plans a bit more visible so we can get more feedback and hopefully get interested people together next week at UDS.

There are three big topics I’d like to talk about. Edubuntu desktop, Edubuntu server and Edubuntu tablet.

Edubuntu desktop

Edubuntu desktop is what we’ve been offering since the first Edubuntu release and what we’ll obviously continue to offer pretty much as it’s today.
It’s not an area I plan on spending much time working on personally but I expect Jonathan to drive most of the work around this.

Basically what the Edubuntu desktop needs nowadays is a better application selection, better testing, better documentation, making sure our application selection works on all our supported platforms and is properly translated.

We’ll also have to refocus some of our efforts and will likely drop some things like our KDE desktop package that hasn’t been updated in years and was essentially doubling our maintenance work which is why we stopped supporting it officially in 12.04.

There are a lot of cool new tools we’ve heard of recently and that really should be packaged and integrated in Edubuntu.

Edubuntu Server

Edubuntu Server will be a new addition to the Edubuntu project, expected to ship in its final form in 14.04 and will be supported for 5 years as part of the LTS.

This is the area I’ll be spending most of my Edubuntu time on as it’s going to be using a lot of technologies I’ve been involved with over the years to offer what I hope will be an amazing server experience.

Edubuntu Server will essentially let you manage a network of Edubuntu, Ubuntu or Windows clients by creating a full featured domain (using samba4).

From the same install DVD as Edubuntu Desktop, you’ll be able to simply choose to install a new Edubuntu Server and create a new domain, or if you already have an Edubuntu domain or even an Active Directory domain, you’ll be able to join an extra server to add extra scalibility or high-availability.

On top of that core domain feature, you’ll be able to add extra roles to your Edubuntu Server, the initial list is:

  • Web hosting platform – Will let you deploy new web services using JuJu so schools in your district or individual teachers can easily get their own website.
  • File server – A standard samba3 file server so all your domain members can easily store and retrieve files.
  • Backup server – Will automatically backup the important data from your servers and if you wish, from your clients too.
  • Schooltool – A school management web service, taking care of all the day to day school administration.

LTSP will also be part of that system as part of Edubuntu Terminal Server which will let you, still from our single install media, install as many new terminal servers as you want, automatically joining the domain, using the centralized authentication, file storage and backup capabilities of your Edubuntu Server.

As I mentioned, the Edubuntu DVD will let you install Edubuntu Desktop, Edubuntu Server and Edubuntu Terminal Server. You’ll simply be asked at installation time whether you want to join an Edubuntu Server or Active Directory domain or if you want your machine to be standalone.

Once installed, Edubuntu Server will be managed through a web interface driving LXC behind the scene to deploy new services, upgrade individual services or deploy new web services using JuJu.
Our goal is to have Edubuntu Server offer an appliance-like experience, never requiring any command line access to the system and easily supporting upgrades from a version to another.

For those wondering what the installation process will look like, I have some notes of the changes available at: http://paste.ubuntu.com/1289041/
I’m expecting to have the installer changes implemented by the time we start building our first 13.04 images.

The rest of Edubuntu Server will be progressively landing during the 13.04 cycle with an early version of the system being released with Edubuntu 13.04, possibly with only a limited selection of roles and without initial support for multiple servers and Active Directory integration.

While initially Edubuntu branded, our hope is that this work will be re-usable by Ubuntu and may one day find its way into Ubuntu Server.
Doing this as part of Edubuntu will give us more time and more flexibility to get it right, build a community around it and get user feedback before we try to get the rest of the world to use it too.

Edubuntu Tablet

During the Edubuntu 12.10 development cycle, the Edubuntu Council approved the sponsorship of 5 tablets by Revolution Linux which were distributed to some of our developers.

We’ve been doing daily armhf builds of Edubuntu, refined our package selections to properly work on ARM and spent countless hours fighting to get our tablet to boot (a ZaTab from ZaReason).
Even though it’s been quite a painful experience so far, we’re still planning on offering a supported armhf tablet image for 14.04, running something very close to our standard Edubuntu Desktop and also featuring integration with Edubuntu Server.

With all the recent news about Ubuntu on the Nexus 7, we’ll certainly be re-discussing what our main supported platform will be during next week’s UDS but we’re certainly planning on releasing 13.04 with experimental tablet support.

LTS vs non-LTS

For those who read our release announcement or visited our website lately, you certainly noticed the emphasis on using the LTS releases.
We really think that most Edubuntu users want something that’s stable, very well tested with regular updates and a long support time, so we’re now always recommending the use of the latest LTS release.

That doesn’t mean we’ll stop doing non-LTS release like the Mythbuntu folks recently decided to do, pretty far from that. What it means however is that we’ll more freely experiment in non-LTS releases so we can easily iterate through our ideas and make sure we release something well polished and rock solid for our LTS releases.

Conclusion

I’m really really looking forward to Edubuntu 14.04. I think the changes we’re planning will help our users a lot and will make it easier than ever to get school districts and individual schools to switch to Edubuntu for both their backend infrastructure with Edubuntu Server and their clients with Edubuntu Desktop and Edubuntu Tablet.

Now all we need is your ideas and if you have some, your time to make it all happen. We usually hang out in #edubuntu on freenode and can also be contacted on the edubuntu-devel mailing-list.

For those of you going to UDS, we’ll try to get an informational session on Edubuntu Server scheduled on top of our usual Edubuntu session. If you’re there and want to know more or want to help, please feel free to grab Jonathan or I in the hallway, at the bar or at one of the evening activities.

Read more
Stéphane Graber

One of our top goals for LXC upstream work during the Ubuntu 12.10 development cycle was reworking the LXC library and turn it from a private library mostly used by the other lxc-* commands into something that’s easy for developers to work with and is accessible from other languages with some bindings.

Although the current implementation isn’t complete enough to consider the API stable and some changes will still happen to it over the months to come, we have pushed the initial implementation to the LXC staging branch on github and put it into the lxc package of Ubuntu 12.10.

The initial version comes with a python3 binding packaged as python3-lxc, that’s what I’ll use now to give you an idea of what’s possible with the API. Note that as we don’t have full user namespaces support at the moment, any code using the LXC API needs to run as root.

First, let’s start with the basics, creating a container, starting it, getting its IP and stopping it:

#!/usr/bin/python3
import lxc
container = lxc.Container("my_container")
container.create("ubuntu", {"release": "precise", "architecture": "amd64"})
container.start()
print(container.get_ips(timeout=10))
container.shutdown(timeout=10)
container.destroy()

So, pretty simple.
It’s also possible to modify the container’s configuration using the .get_config_item(key) and .set_config_item(key, value) functions. For those keys supporting multiple values, a list will be returned and a list will be accepted as a value by .set_config_item.

Network configuration can be accessed through the .network property which is essentially a list of all network interfaces of the container, properties can be changed that way or through .set_config_item and saved to the config file with .save_config().

The API isn’t terribly well documented at this point, help messages are present for all functions but there’s no generated html help yet.

To get a better idea of the functions exported by the API, you may want to look at the API test script. This script uses all the functions and properties exported by the python module so it should be a reasonable reference.

Read more
Stéphane Graber

With the DNS changes in Ubuntu 12.04, most development machines running with libvirt and lxc end up running quite a few DNS servers.

These DNS servers work fine when queried from a system on their network, but aren’t integrated with the main dnsmasq instance and so won’t let you resolve your VM and containers from outside of their respective networks.

One way to solve that is to install yet another DNS resolver and use it to redirect between the various dnsmasq instances. That can quickly become tricky to setup and doesn’t integrate too well with resolvconf and NetworkManager.

Seeing a lot of people wondering how to solve that problem, I took a few minutes yesterday to come up with an ssh configuration that’d allow one to access their containers and VM using their name.

The result is the following, to add to your ~/.ssh/config file:

Host *.lxc
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null
  ProxyCommand nc $(host $(echo %h | sed "s/\.lxc//g") 10.0.3.1 | tail -1 | awk '{print $NF}') %p

Host *.libvirt
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null
  ProxyCommand nc $(host $(echo %h | sed "s/\.libvirt//g") 192.168.122.1 | tail -1 | awk '{print $NF}') %p

After that, things like:

  • ssh user@myvm.libvirtu
  • ssh ubuntu@mycontainer.lxc

Will just work.

For LXC, you may also want to add a “User ubuntu” line to that config as it’s the default user for LXC containers on Ubuntu.
If you configured your bridges with a non-default subnet, you’ll also need to update the IPs or add more sections to the config.

These also turn off StrictHostKeyChecking and UserKnownHostsFile as my VMs and containers are local to my machine (reducing risk of MITM attacks) and tend to exist only for a few hours, to then be replaced by a completely different one with a different SSH host key. Depending on your setup, you may want to remove these lines.

Read more
Stéphane Graber

Quite a few people have been asking for a status update of LXC in Ubuntu as of Ubuntu 12.04 LTS. This post is meant as an overview of the work we did over the past 6 months and pointers to more detailed blog posts for some of the new features.

What’s LXC?

LXC is a userspace tool controlling the kernel namespaces and cgroup features to create system or application containers.

To give you an idea:

  • Feels like somewhere between a chroot and a VM
  • Can run a full distro using the “host” kernel
  • Processes running in a container are visible from the outside
  • Doesn’t require any specific hardware, works on all supported architectures

A libvirt driver for LXC exists (libvirt-lxc), however it doesn’t use the “lxc” userspace tool even though it uses the same kernel features.

Making LXC easier

One of the main focus for 12.04 LTS was to make LXC dead easy to use, to achieve this, we’ve been working on a few different fronts fixing known bugs and improving LXC’s default configuration.

Creating a basic container and starting it on Ubuntu 12.04 LTS is now down to:

sudo apt-get install lxc
sudo lxc-create -t ubuntu -n my-container
sudo lxc-start -n my-container

This will default to using the same version and architecture as your machine, additional option are obviously available (–help will list them). Login/Password are ubuntu/ubuntu.

Another thing we worked on to make LXC easier to work with is reducing the number of hacks required to turn a regular system into a container down to zero.
Starting with 12.04, we don’t do any modification to a standard Ubuntu system to get it running in a container.
It’s now even possible to take a raw VM image and have it boot in a container!

The ubuntu-cloud template also lets you get one of our EC2/cloud images and have it start as a container instead of a cloud instance:

sudo apt-get install lxc cloud-utils
sudo lxc-create -t ubuntu-cloud -n my-cloud-container
sudo lxc-start -n my-cloud-container

And finally, if you want to test the new cool stuff, you can also use juju with LXC:

[ ! -f ~/.ssh/id_rsa.pub ] && ssh-keygen -t rsa
sudo apt-get install juju apt-cacher-ng zookeeper lxc libvirt-bin --no-install-recommends
sudo adduser $USER libvirtd
juju bootstrap
sed -i "s/ec2/local/" ~/.juju/environments.yaml
echo " data-dir: /tmp/juju" >> ~/.juju/environments.yaml
juju bootstrap
juju deploy mysql
juju deploy wordpress
juju add-relation wordpress mysql
juju expose wordpress

# To tail the logs
juju debug-log

# To get the IPs and status
juju status

Making LXC safer

Another main focus for LXC in Ubuntu 12.04 was to make it safe. John Johansen did an amazing work of extending apparmor to let us implement per-container apparmor profiles and prevent most known dangerous behaviours from happening in a container.

NOTE: Until we have user namespaces implemented in the kernel and used by the LXC we will NOT say that LXC is root safe, however the default apparmor profile as shipped in Ubuntu 12.04 LTS is blocking any armful action that we are aware of.

This mostly means that write access to /proc and /sys are heavily restricted, mounting filesystems is also restricted, only allowing known-safe filesystems to be mounted by default. Capabilities are also restricted in the default LXC profile to prevent a container from loading kernel modules or control apparmor.

More details on this are available here:

Other cool new stuff

Emulated architecture containers

It’s now possible to use qemu-user-static with LXC to run containers of non-native architectures, for example:

sudo apt-get install lxc qemu-user-static
sudo lxc-create -n my-armhf-container -t ubuntu -- -a armhf
sudo lxc-start -n my-armhf-container

Ephemeral containers

Quite a bit of work also went into lxc-start-ephemeral, the tool letting you start a copy of an existing container using an overlay filesystem, discarding any change you make on shutdown:

sudo apt-get install lxc
sudo lxc-create -n my-container -t ubuntu
sudo lxc-start-ephemeral -o my-container

Container nesting

You can now start a container inside a container!
For that to work, you first need to create a new apparmor profile as the default one doesn’t allow this for security reason.
I already did that for you, so the few commands below will download it and install it in /etc/apparmor.d/lxc/lxc-with-nesting. This profile (or something close to it) will ship in Ubuntu 12.10 as an example of alternate apparmor profile for container.

sudo apt-get install lxc
sudo lxc-create -t ubuntu -n my-host-container -t ubuntu
sudo wget https://www.stgraber.org/download/lxc-with-nesting -O /etc/apparmor.d/lxc/lxc-with-nesting
sudo /etc/init.d/apparmor reload
sudo sed -i "s/#lxc.aa_profile = unconfined/lxc.aa_profile = lxc-container-with-nesting/" /var/lib/lxc/my-host-container/config
sudo lxc-start -n my-host-container
(in my-host-container) sudo apt-get install lxc
(in my-host-container) sudo stop lxc
(in my-host-container) sudo sed -i "s/10.0.3/10.0.4/g" /etc/default/lxc
(in my-host-container) sudo start lxc
(in my-host-container) sudo lxc-create -n my-sub-container -t ubuntu
(in my-host-container) sudo lxc-start -n my-sub-container

Documentation

Outside of the existing manpages and blog posts I mentioned throughout this post, Serge Hallyn did a very good job at creating a whole section dedicated to LXC in the Ubuntu Server Guide.
You can read it here: https://help.ubuntu.com/12.04/serverguide/lxc.html

Next steps

Next week we have the Ubuntu Developer Summit in Oakland, CA. There we’ll be working on the plans for LXC in Ubuntu 12.10. We currently have two sessions scheduled:

If you want to make sure the changes you want will be in Ubuntu 12.10, please make sure to join these two sessions. It’s possible to participate remotely to the Ubuntu Developer Summit, through IRC and audio streaming.

My personal hope for LXC in Ubuntu 12.10 is to have a clean liblxc library that can be used to create bindings and be used in languages like python. Working towards that goal should make it easier to do automated testing of LXC and cleanup our current tools.

I hope this post made you want to try LXC or for existing users, made you discover some of the new features that appeared in Ubuntu 12.04. We’re actively working on improving LXC both upstream and in Ubuntu, so do not hesitate to report bugs (preferably with “ubuntu-bug lxc”).

Read more
Stéphane Graber

One thing that we’ve been working on for LXC in 12.04 is getting rid of any remaining LXC specific hack in our templates. This means that you can now run a perfectly clean Ubuntu system in a container without any change.

To better illustrate that, here’s a guide on how to boot a standard Ubuntu VM in a container.

First, you’ll need an Ubuntu VM image in raw disk format. The next few steps also assume a default partitioning where the first primary partition is the root device. Make sure you have the lxc package installed and up to date and lxcbr0 enabled (the default with recent LXC).

Then run kpartx -a vm.img this will create loop devices in /dev/mapper for your VM partitions, in the following configuration I’m assuming /dev/mapper/loop0p1 is the root partition.

Now write a new LXC configuration file (myvm.conf in my case) containing:

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxcbr0
lxc.utsname = myvminlxc

lxc.tty = 4
lxc.pts = 1024
lxc.rootfs = /dev/mapper/loop0p1
lxc.arch = amd64
lxc.cap.drop = sys_module mac_admin

lxc.cgroup.devices.deny = a
# Allow any mknod (but not using the node)
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
#lxc.cgroup.devices.allow = c 4:0 rwm
#lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# rtc
lxc.cgroup.devices.allow = c 254:0 rwm
#fuse
lxc.cgroup.devices.allow = c 10:229 rwm
#tun
lxc.cgroup.devices.allow = c 10:200 rwm
#full
lxc.cgroup.devices.allow = c 1:7 rwm
#hpet
lxc.cgroup.devices.allow = c 10:228 rwm
#kvm
lxc.cgroup.devices.allow = c 10:232 rwm

The bits in bold may need updating if you’re not using the same architecture, partition scheme or bridges as I’m.

Then finally, run: lxc-start -n myvminlxc -f myvm.conf

And watch your VM boot in an LXC container.

I did this test with a desktop VM using network manager so it didn’t mind LXC’s random MAC address, server VMs might get stuck for a minute at boot time because of that though.
In such case, either clean /etc/udev/rules.d/70-persistent-net.rules or set “lxc.network.hwaddr” to the same mac address as your VM.

Once done, run kpartx -d vm.img to remove the loop devices.

Read more
Stéphane Graber

It took a while to get some apt resolver bugs fixed, a few packages marked for multi-arch and some changes in the Ubuntu LXC template, but since yesterday, you can now run (using up to date Precise):

  • sudo apt-get install lxc qemu-user-static
  • sudo lxc-create -n armhf01 -t ubuntu — -a armhf -r precise
  • sudo lxc-start -n armhf01
  • Then login with root as both login and password

And enjoy an armhf system running on your good old x86 machine.

Now, obviously it’s pretty far from what you’d get on real ARM hardware.
It’s using qemu’s user space CPU emulation (qemu-user-static), so won’t be particularly fast, will likely use a lot of CPU and may give results pretty different from what you’d expect on real hardware.

Also, because of limitations in qemu-user-static, a few packages from the “host” architecture are installed in the container. These are mostly anything that requires the use of ptrace (upstart) or the use of netlink (mountall, iproute and isc-dhcp-client).
This is the bare minimum I needed to install to get the rest of the container to work using armhf binaries. I obviously didn’t test everything and I’m sure quite a few other packages will fail in such environment.

This feature should be used as an improvement on top of a regular armhf chroot using qemu-user-static and not as a replacement for actual ARM hardware (obviously), but it’s cool to have around and nice to show what LXC can do.

I confirmed it to work for armhf and armel, powerpc should also work, though it didn’t succeed to debootstrap when I tried it earlier today.

Enjoy!

Read more
Stéphane Graber

One of my focus for this cycle is to get Ubuntu’s support for complex networking working in a predictable way. The idea was to review exactly what’s happening at boot time, get a list of possible scenario that are used on servers in corporate environment and make sure these always work.

Bonding

Bonding basically means aggregating multiple physical link into one virtual link for high availability and load balancing. There are different ways of setting up such a link though the industry standard is 802.3ad (LACP – Link Aggregation Control Protocol). In that mode your server will negotiate with your switch to establish an aggregate link, then send monitoring packets to detect failure. LACP also does load balancing (MAC, IP and protocol based depending on hardware support).

One problem we had since at least Ubuntu 10.04 LTS is that Ubuntu’s boot sequence is event based, including bringing up network interfaces. The good old “ifup -a” is only done at the end of the boot sequence to try and fix anything that wasn’t brought up through events.

Unfortunately that meant that if your server takes a long time to detect the hardware, your bond would be initialised before the network cards have been detected, giving you a bond0 without a MAC address, making DHCP queries fail in pretty weird ways and making bridging or tagging fail with “Operation not permitted”.
As that all depends on hardware detection timing, it was racy, giving you random results at boot time.

Thankfully that should now be all fixed in 12.04, the new ifenslave-2.6 I uploaded a few weeks ago now initialises the bond whenever the first slave appears. If no slave appeared by the time we get to the catch-all “ifup -a”, it’ll simply wait for up to an additional minute for a slave to appear before giving up and continuing the boot sequence.
To avoid another common race condition where a bridge is brought up with a bond as one of its members before the bond is ready, ifenslave will now detect a bond is part of a bridge and add it only once ready.

Tagging

Another pretty common thing on corporate networks is the use of VLANs (802.1q), letting you create up to 4096 virtual networks on one link.
In the past, Ubuntu would rely on the catch all “ifup -a” to create any required vlan interface, once again, that’s a problem when an interface that depends on that vlan interface is initialised before the vlan interface is created.

To fix that, Ubuntu 12.04′s vlan package now ships with a udev rule that triggers the creation of the vlan interface whenever its parent interface is created.

Bridging

Bridging on Linux can be seen as creating a virtual switch on your system (including STP support).

Bridges have been working pretty well for a while on Ubuntu as we’ve been shipping a udev rule similar to the one for vlans for a few releases already. Members are simply added to the bridge as they appear on the system. The changes to ifenslave and the vlan package make sure that even bond interfaces with VLANs get properly added to bridges.

Complex network configuration example

My current test setup for networking on Ubuntu 12.04 is actually something I’ve been using on my network for years.

As you may know, I’m also working on LXC (Linux Containers), so my servers usually run somewhere between 15 and 80 containers, each of these container has a virtual ethernet interface that’s bridged.
I have one bridge per network zone, each of these network zone being a separate VLAN. These VLANs are created on top of a two gigabit link bond.

At boot time, the following happens (roughly):

  1. One of the two network interfaces appear
  2. The bond is initialised and the first interface is enslaved
  3. This triggers the creation of all the VLAN interfaces
  4. Creating the VLAN interfaces triggers the creation of all the bridges
  5. All the VLAN interfaces are added to their respective bridge
  6. The other network interface appear and gets added to the bond

My /etc/network/interfaces can be found here:
http://www.stgraber.org/download/complex-interfaces

This contains the very strict minimum needed for LACP to work. One thing worth noting is that the two physical interfaces are listed before bond0, this is to ensure that even if the events don’t work and we have to rely on the fallback “ifup -a”, the interfaces will be initialised in the right order avoiding the 60s delay.

Please note that this example will only reliably work with Ubuntu Precise (to become 12.04 LTS). It’s still a correct configuration for previous releases but race conditions may give you a random result.

I’ll be trying to push these changes to Ubuntu 11.10 as they are pretty easy to backport there, however it’d be much harder and very likely dangerous to backport these to even older releases.
For these, the only recommendation I can give is to add some “pre-up sleep 5″ or similar to your bridges and vlan interfaces to make sure whatever interface they depend on exists and is ready by the time the “ifup -a” call is reached.

IPv6

Another interesting topic for 12.04 is IPv6, as a release that’ll be supported for 5 years on both servers and desktops, it’s important to get IPv6 right.

Ubuntu 12.04 LTS will be the first Ubuntu release shipping with IPv6 private extensions turned on by default. Ubuntu 11.10 already brought most of what’s needed for IPv6 support on the desktop and in the installer, supporting SLAAC (stateless autoconfiguration), stateless DHCPv6 and stateful DHCPv6.

Once we get a new ifupdown in Ubuntu Precise, we’ll have full support for IPv6 also for people that aren’t using Network Manager (mostly servers) which should at this point give us support for any IPv6 setup you may find.

The userspace has been working pretty well with IPv6 for years. I recently made my whole network dual-stack and now have all my servers and services defaulting to IPv6 for a total of 40% of my network traffic (roughly 1.5TB a month of IPv6 traffic). The only user space related problem I noticed is the lack of IPv6 support in Nagios’ nrpe plugin, meaning I can’t start converting servers to single stack IPv6 as I’d loose monitoring …

I also wrote a python script using pypcap to give me the percentage of ipv6 and ipv4 traffic going through a given interface, the script can be found here: http://www.stgraber.org/download/v6stats.py (start with: python v6stats.py eth0)

What now ?

At this point, I think Ubuntu Precise is pretty much ready as far as networking is concerned. The only remaining change is the new ifupdown and the small installer change that comes with it for DHCPv6 support.

If you have a spare machine or VM, now is the time to grab a recent build of Ubuntu Precise and make sure whatever network configuration you use in production works reliably for you and that any hack you needed in the past can now be dropped.
If your configuration doesn’t work at all or fails randomly, please file a bug so we can have a look at your configuration and fix any bug (or documentation).

Read more