Canonical Voices

Posts tagged with 'lxc'

Dustin Kirkland


As always, I enjoyed speaking at the SCALE14x event, especially at the new location in Pasadena, California!

What if you could adapt a package from a newer version of Ubuntu, onto your stable LTS desktop/server?

Or, as a developer, what if you could provide your latest releases to your users running an older LTS version of Ubuntu?

Introducing adapt!

adapt is a lot like apt...  It’s a simple command that installs packages.

But it “adapts” a requested version to run on your current system.

It's a simple command that installs any package from any release of Ubuntu into any version of Ubuntu.

How does adapt work?

Simple… Containers!

More specifically, LXD system containers.

Why containers?

Containers can run anywhere, physical, virtual, desktops, servers, and any CPU architecture.

And containers are light and fast!  Zero latency and no virtualization overhead.

Most importantly, system containers are perfect copies of the released distribution, the operating system itself.

And all of that continuous integration testing we do perform on every single Ubuntu release?

We leverage that!
You can download a PDF of the slides for my talk here, or flip through them here:



I hope you enjoy some of the magic that LXD is making possible ;-)

Cheers!
Dustin

Read more
pitti

The last two major autopkgtest releases (3.18 from November, and 3.19 fresh from yesterday) bring some new features that are worth spreading.

New LXD virtualization backend

3.19 debuts the new adt-virt-lxd virtualization backend. In case you missed it, LXD is an API/CLI layer on top of LXC which introduces proper image management, seamlessly use images and containers on remote locations, intelligently caching them locally, automatically configure performant storage backends like zfs or btrfs, and just generally feels really clean and much simpler to use than the “classic” LXC.

Setting it up is not complicated at all. Install the lxd package (possibly from the backports PPA if you are on 14.04 LTS), and add your user to the lxd group. Then you can add the standard LXD image server with

  lxc remote add lco https://images.linuxcontainers.org:8443

and use the image to run e. g. the libpng test from the archive:

  adt-run libpng --- lxd lco:ubuntu/trusty/i386
  adt-run libpng --- lxd lco:debian/sid/amd64

The adt-virt-lxd.1 manpage explains this in more detail, also how to use this to run tests in a container on a remote host (how cool is that!), and how to build local images with the usual autopkgtest customizations/optimizations using adt-build-lxd.

I have btrfs running on my laptop, and LXD/autopkgtest automatically use that, so the performance really rocks. Kudos to Stéphane, Serge, Tycho, and the other LXD authors!

The motivation for writing this was to make it possible to move our armhf testing into the cloud (which for $REASONS requires remote containers), but I now have a feeling that soon this will completely replace the existing adt-virt-lxc virt backend, as its much nicer to use.

It is covered by the same regression tests as the LXC runner, and from the perspective of package tests that you run in it it should behave very similar to LXC. The one problem I’m aware of is that autopkgtest-reboot-prepare is broken, but hardly anything is using that yet. This is a bit complicated to fix, but I expect it will be in the next few weeks.

MaaS setup script

While most tests are not particularly sensitive about which kind of hardware/platform they run on, low-level software like the Linux kernel, GL libraries, X.org drivers, or Mir very much are. There is a plan for extending our automatic tests to real hardware for these packages, and being able to run autopkgtests on real iron is one important piece of that puzzle.

MaaS (Metal as a Service) provides just that — it manages a set of machines and provides an API for installing, talking to, and releasing them. The new maas autopkgtest ssh setup script (for the adt-virt-ssh backend) brings together autopkgtest and real hardware. Once you have a MaaS setup, get your API key from the web UI, then you can run a test like this:

  adt-run libpng --- ssh -s maas -- \
     --acquire "arch=amd64 tags=touchscreen" -r wily \
     http://my.maas.server/MAAS 123DEADBEEF:APIkey

The required arguments are the MaaS URL and the API key. Without any further options you will get any available machine installed with the default release. But usually you want to select a particular one by architecture and/or tags, and install a particular distro release, which you can do with the -r/--release and --acquire options.

Note that this is not wired into Ubuntu’s production CI environment, but it will be.

Selectively using packages from -proposed

Up until a few weeks ago, autopkgtest runs in the CI environment were always seeing/using the entirety of -proposed. This often led to lockups where an application foo and one of its dependencies libbar got a new version in -proposed at the same time, and on test regressions it was not clear at all whose fault it was. This often led to perfectly good packages being stuck in -proposed for a long time, and a lot of manual investigation about root causes.

.

These days we are using a more fine-grained approach: A test run is now specific for a “trigger”, that is, the new package in -proposed (e. g. a new version of libbar) that caused the test (e. g. for “foo”) to run. autopkgtest sets up apt pinning so that only the binary packages for the trigger come from -proposed, the rest from -release. This provides much better isolation between the mush of often hundreds of packages that get synced or uploaded every day.

This new behaviour is controlled by an extension of the --apt-pocket option. So you can say

  adt-run --apt-pocket=proposed=src:foo,libbar1,libbar-data ...

and then only the binaries from the foo source, libbar1, and libbar-data will come from -proposed, everything else from -release.

Caveat:Unfortunately apt’s pinning is rather limited. As soon as any of the explicitly listed packages depends on a package or version that is only available in -proposed, apt falls over and refuses the installation instead of taking the required dependencies from -proposed as well. In that case, adt-run falls back to the previous behaviour of using no pinning at all. (This unfortunately got worse with apt 1.1, bug report to be done). But it’s still helpful in many cases that don’t involve library transitions or other package sets that need to land in lockstep.

Unified testbed setup script

There is a number of changes that need to be made to testbeds so that tests can run with maximum performance (like running dpkg through eatmydata, disabling apt translations, or automatically using the host’s apt-cacher-ng), reliable apt sources, and in a minimal environment (to detect missing dependencies and avoid interference from unrelated services — these days the standard cloud images have a lot of unnecessary fat). There is also a choice whether to apply these only once (every day) to an autopkgtest specific base image, or on the fly to the current ephemeral testbed for every test run (via --setup-commands). Over time this led to quite a lot of code duplication between adt-setup-vm, adt-build-lxc, the new adt-build-lxd, cloud-vm-setup, and create-nova-image-new-release.

I now cleaned this up, and there is now just a single setup-commands/setup-testbed script which works for all kinds of testbeds (LXC, LXD, QEMU images, cloud instances) and both for preparing an image with adt-buildvm-ubuntu-cloud, adt-build-lx[cd] or nova, and with preparing just the current ephemeral testbed via --setup-commands.

While this is mostly an internal refactorization, it does impact users who previously used the adt-setup-vm script for e. g. building Debian images with vmdebootstrap. This script is now gone, and the generic setup-testbed entirely replaces it.

Misc

Aside from the above, every new version has a handful of bug fixes and minor improvements, see the git log for details. As always, if you are interested in helping out or contributing a new feature, don’t hesitate to contact me or file a bug report.

Read more
Stéphane Graber

TLDR: NorthSec is an incredible security event, our CTF simulates a whole internet for every participating team. This allows us to create just about anything, from a locked down country to millions of vulnerable IoT devices spread across the globe. However that flexibility comes at a high cost hardware-wise, as we’re getting bigger and bigger, we need more and more powerful servers and networking gear. We’re very actively looking for sponsors so get in touch with me!

What’s NorthSec?

NorthSec is one of the biggest on-site Capture The Flag (CTF), security contest in North America. It’s organized yearly over a weekend in Montreal (usually in May) and since the last edition, has been accompanied by a two days security conference before the CTF itself. The rest of this post will only focus on the CTF part though.

the-room

A view of the main room at NorthSec 2015

Teams arrive at the venue on Friday evening, get setup at their table and then get introduced to this year’s scenario and given access to our infrastructure. There they will have to fight their way through challenges, each earning them points and letting them go further and further. On Sunday afternoon, the top 3 teams are awarded their prize and we wrap up for the year.

Size wise, for the past two years we’ve had a physical limit of up to 32 teams of 8 participants and then a bunch of extra unaffiliated visitors. For the 2016 edition, we’re raising this to 50 teams for a grand total of 400 participants, thanks to some shuffling at the venue making some more room for us.

Why is it special?

The above may sound pretty simple and straightforward, however there are a few important details that sets NorthSec apart from other CTFs.

  • It is entirely on-site. There are some very big online CTFs out there but very few on-site ones. Having everyone participating in the same room is valuable from a networking point of view but also ensures fairness by enforcing fixed size teams and equal network bandwidth and latency.
  • Every team gets its very own copy of the whole infrastructure. There are no shared services in the simulated world we provide them. That means one team’s actions cannot impact another.
  • Each simulation is its own virtual world with its own instance of the internet, we use hundreds of LXC containers and thousands of VLANs and networks FOR EVERY TEAM to provide the most realistic and complete environment you can think of.
World map of our fake internet

World map of our fake internet

What’s our infrastructure like?

Due to the very high bandwidth and low latency requirements, most of the infrastructure is hosted on premises and on our hardware. We do plan on offloading Windows virtual machines to a public cloud for the next edition though.

We also provide a mostly legacy free environment to our contestants, all of our challenges are connected to IPv6-only networks and run on 64bit Ubuntu LTS  in LXC with state of the art security configurations.

Our rack

Our rack, on location at NorthSec 2015

 

All in all, for 32 teams (last year’s edition), we had:

  • 48000 virtual network interfaces
  • 2000 virtual carriers
  • 16000 BGP routers
  • 17000 Ubuntu containers
  • 100 Windows virtual machines
  • 20000 routing table entries

And all of that was running on:

  • Two firewalls (DELL SC1425)
  • Two infrastructure servers (DELL SC1425)
  • One management server (HP DL380 G5)
  • Four main contest hosts (HP DL380 G5)
  • Three backup contest hosts (DELL C6100)

On average we had 7 full simulations and 21 virtual machines running on every host (the backup hosts only had one each). That means each of the main contest hosts had:

  • 10500 virtual network interfaces
  • 435 virtual carriers
  • 3500 BGP routers
  • 3700 Ubuntu containers
  • 21 Windows virtual machines
  • 4375 routing table entries

Not too bad for servers that are (SC1425) or are getting close (DL380 G5) to being 10 years old now.

Past infrastructure challenges

In the past editions we’ve found numerous bugs in the various technologies we use when put under such a crazy load:

  • A variety of switch firmware bugs when dealing with several thousand IPv6-only networks.
  • Multiple Linux IPv6 kernel bugs (and one security issue) also related to an excess of IPv6 multicast traffic.
  • Several memory leaks and other bugs in LXC and related components that become very visible when you’re running upwards of 10000 containers.
  • Several more Linux kernel bugs related to performance scaling as we create more and more namespaces and nested namespaces.

As our infrastructure staff is very invested in these technologies by being upstream developers or contributors to the main projects we use, those bugs were all rapidly reported, discussed and fixed. We always look forward to the next NorthSec as an opportunity to test the latest technology at scale in a completely controlled environment.

How can you help?

As I mentioned, we’ve been capped at 32 teams and around 300 attendees for the past two years. Our existing hardware was barely sufficient  to handle  the load during those two editions, we urgently need to refresh our hardware to offer the best possible experience to our participants.

We’re planning on replacing most if not all of our hardware with slightly more recent equivalents, also upgrading from rotating drives to SSDs and improving our network. On the software side, we’ll be upgrading to a newer Linux kernel, possibly to Ubuntu 16.04, switch from btrfs to zfs and from LXC to LXD.

We are a Canadian non-profit organization with all our staff being volunteers so we very heavily rely on sponsors to be able to make the event a success.

If you or your company would like to help by sponsoring our infrastructure, get in touch with me. We have several sponsoring levels and can get you the visibility you’d like, ranging from a mention on our website and at the event to on-site presence with a recruitment booth and even, if our interests align, inclusion of your product in some of our challenges.

Also, as I briefly mentioned at the beginning, we have a two days, single-track conference ahead of the CTF. We’re actively looking for speakers, if you have something interesting to present, the CFP is here.

Extra resources

Read more
Dustin Kirkland


Picture yourself containers on a server
With systemd trees and spawned tty's
Somebody calls you, you answer quite quickly
A world with the density so high

    - Sgt. Graber's LXD Smarts Club Band

Last week, we proudly released Ubuntu 15.10 (Wily) -- the final developer snapshot of the Ubuntu Server before we focus the majority of our attention on quality, testing, performance, documentation, and stability for the Ubuntu 16.04 LTS cycle in the next 6 months.

Notably, LXD has been promoted to the Ubuntu Main archive, now commercially supported by Canonical.  That has enabled us to install LXD by default on all Ubuntu Servers, from 15.10 forward.
Join us for an interactive, live webinar on November 12th at 5pm BST/12pm EST led by James Page, where he will demonstrate LXD as the fastest hypervisor in OpenStack!
That means that every Ubuntu server -- Intel, AMD, ARM, POWER, and even Virtual Machines in the cloud -- is now a full machine container hypervisor, capable of hosting hundreds of machine containers, right out of the box!

LXD in the Sky with Diamonds!  Well, LXD is in the Cloud with Diamond level support from Canonical, anyway.  You can even test it in your web browser here.

The development tree of Xenial (Ubuntu 16.04 LTS) has already inherited this behavior, and we will celebrate this feature broadly through our use of LXD containers in Juju, MAAS, and the reference platform of Ubuntu OpenStack, as well as the new nova-lxd hypervisor in the OpenStack Autopilot within Landscape.

While the young and the restless are already running Wily Ubuntu 15.10, the bold and the beautiful are still bound to their Trusty Ubuntu 14.04 LTS servers.

At Canonical, we understand both motivations, and this is why we have backported LXD to the Trusty archives, for safe, simple consumption and testing of this new generation of machine containers there, on your stable LTS.

Installing LXD on Trusty simply requires enabling the trusty-backports pocket, and installing the lxd package from there, with these 3 little commands:

sudo sed -i -e "/trusty-backports/ s/^# //" /etc/apt/sources.list
sudo apt-get update; sudo apt-get dist-upgrade -y
sudo apt-get -t trusty-backports install lxd

In minutes, you can launch your first LXD containers.  First, inherit your new group permissions, so you can execute the lxc command as your non-root user.  Then, import some images, and launch a new container named lovely-rita.  Shell into that container, and examine the process tree, install some packages, check the disk and memory and cpu available.  Finally, exit when you're done, and optionally delete the container.

newgrp lxd
lxd-images import ubuntu --alias ubuntu
lxc launch ubuntu lovely-rita
lxc list
lxc exec lovely-rita bash
ps -ef
apt-get update
df -h
free
cat /proc/cpuinfo
exit
lxc delete lovely-rita

I was able to run over 600 containers simultaneously on my Thinkpad (x250, 16GB of RAM), and over 60 containers on an m1.small in Amazon (1.6GB of RAM).

We're very interested in your feedback, as LXD is one of the most important features of the Ubuntu 16.04 LTS.  You can learn more about LXD, view the source code, file bugs, discuss on the mailing list, and peruse the Linux Containers upstream projects.

With a little help from my friends!
:-Dustin

Read more
Dustin Kirkland


Canonical is delighted to sponsor ContainerCon 2015, a Linux Foundation event in Seattle next week, August 17-19, 2015. It's quite exciting to see the A-list of sponsors, many of them newcomers to this particular technology, teaming with energy around containers. 

From chroots to BSD Jails and Solaris Zones, the concepts behind containers were established decades ago, and in fact traverse the spectrum of server operating systems. At Canonical, we've been working on containers in Ubuntu for more than half a decade, providing a home and resources for stewardship and maintenance of the upstream Linux Containers (LXC) project since 2010.

Last year, we publicly shared our designs for LXD -- a new stratum on top of LXC that endows the advantages of a traditional hypervisor into the faster, more efficient world of containers.

Those designs are now reality, with the open source Golang code readily available on Github, and Ubuntu packages available in a PPA for all supported releases of Ubuntu, and already in the Ubuntu 15.10 beta development tree. With ease, you can launch your first LXD containers in seconds, following this simple guide.

LXD is a persistent daemon that provides a clean RESTful interface to manage (start, stop, clone, migrate, etc.) any of the containers on a given host.

Hosts running LXD are handily federated into clusters of container hypervisors, and can work as Nova Compute nodes in OpenStack, for example, delivering Infrastructure-as-a-Service cloud technology at lower costs and greater speeds.

Here, LXD and Docker are quite complementary technologies. LXD furnishes a dynamic platform for "system containers" -- containers that behave like physical or virtual machines, supplying all of the functionality of a full operating system (minus the kernel, which is shared with the host). Such "machine containers" are the core of IaaS clouds, where users focus on instances with compute, storage, and networking that behave like traditional datacenter hardware.

LXD runs perfectly well along with Docker, which supplies a framework for "application containers" -- containers that enclose individual processes that often relate to one another as pools of micro services and deliver complex web applications.

Moreover, the Zen of LXD is the fact that the underlying container implementation is actually decoupled from the RESTful API that drives LXD functionality. We are most excited to discuss next week at ContainerCon our work with Microsoft around the LXD RESTful API, as a cross-platform container management layer.

Ben Armstrong, a Principal Program Manager Lead at Microsoft on the core virtualization and container technologies, has this to say:
“As Microsoft is working to bring Windows Server Containers to the world – we are excited to see all the innovation happening across the industry, and have been collaborating with many projects to encourage and foster this environment. Canonical’s LXD project is providing a new way for people to look at and interact with container technologies. Utilizing ‘system containers’ to bring the advantages of container technology to the core of your cloud infrastructure is a great concept. We are looking forward to seeing the results of our engagement with Canonical in this space.”
Finally, if you're in Seattle next week, we hope you'll join us for the technical sessions we're leading at ContainerCon 2015, including: "Putting the D in LXD: Migration of Linux Containers", "Container Security - Past, Present, and Future", and "Large Scale Container Management with LXD and OpenStack". Details are below.
Date: Monday, August 17 • 2:20pm - 3:10pm
Title: Large Scale Container Management with LXD and OpenStack
Speaker: Stéphane Graber
Abstracthttp://sched.co/3YK6
Location: Grand Ballroom B
Schedulehttp://sched.co/3YK6 
Date: Wednesday, August 19 10:25am-11:15am
Title: Putting the D in LXD: Migration of Linux Containers
Speaker: Tycho Andersen
Abstract: http://sched.co/3YTz
Location: Willow A
Schedule: http://sched.co/3YTz
Date: Wednesday, August 19 • 3:00pm - 3:50pm
Title: Container Security - Past, Present and Future
Speaker: Serge Hallyn
Abstract: http://sched.co/3YTl
Location: Ravenna
Schedule: http://sched.co/3YTl
Cheers,
Dustin

Read more
Dustin Kirkland

652 Linux containers running on a Laptop?  Are you kidding me???

A couple of weeks ago, at the OpenStack Summit in Vancouver, Canonical released the results of some scalability testing of Linux containers (LXC) managed by LXD.

Ryan Harper and James Page presented their results -- some 536 Linux containers on a very modest little Intel server (16GB of RAM), versus 37 KVM virtual machines.

Ryan has published the code he used for the benchmarking, and I've used to to reproduce the test on my dev laptop (Thinkpad x230, 16GB of RAM, Intel i7-3520M).

I managed to pack a whopping 652 Ubuntu 14.04 LTS (Trusty) containers on my Ubuntu 15.04 (Vivid) laptop!


The system load peaked at 1056 (!!!), but I was using merely 56% of 15.4GB of system memory.  Amazingly, my Unity desktop and Byobu command line were still perfectly responsive, as were the containers that I ssh'd into.  (Aside: makes me wonder if the Linux system load average is accounting for container process correctly...)


Check out the process tree for a few hundred system containers here!

As for KVM, I managed to launch 31 virtual machines without KSM enabled, and 65 virtual machines with KSM enabled and working hard.  So that puts somewhere between 10x - 21x as many containers as virtual machines on the same laptop.

You can now repeat these tests, if you like.  Please share your results with #LXD on Google+ or Twitter!

I'd love to see someone try this in AWS, anywhere from an m3.small to an r3.8xlarge, and share your results ;-)

Density test instructions

## Install lxd
$ sudo add-apt-repository ppa:ubuntu-lxc/lxd-git-master
$ sudo apt-get update
$ sudo apt-get install -y lxd bzr
$ cd /tmp
## At this point, it's a good idea to logout/login or reboot
## for your new group permissions to get applied
## Grab the tests, disable the tools download
$ bzr branch lp:~raharper/+junk/density-check
$ cd density-check
$ mkdir lxd_tools
## Periodically squeeze your cache
$ sudo bash -x -c 'while true; do sleep 30; \
echo 3 | sudo tee /proc/sys/vm/drop_caches; \
free; done' &
## Run the LXD test
$ ./density-check-lxd --limit=mem:512m --load=idle release=trusty arch=amd64
## Run the KVM test
$ ./density-check-kvm --limit=mem:512m --load=idle release=trusty arch=amd64

As for the speed-of-launch test, I'll cover that in a follow-up post!

Can you contain your excitement?

Cheers!
Dustin

Read more
Stéphane Graber

Introduction

For the past 6 months, Serge Hallyn, Tycho Andersen, Chuck Short, Ryan Harper and myself have been very busy working on a new container project called LXD.

Ubuntu 15.04, due to be released this Thursday, will contain LXD 0.7 in its repository. This is still the early days and while we’re confident LXD 0.7 is functional and ready for users to experiment, we still have some work to do before it’s ready for critical production use.

LXD logo

So what’s LXD?

LXD is what we call our container “lightervisor”. The core of LXD is a daemon which offers a REST API to drive full system containers just like you’d drive virtual machines.

The LXD daemon runs on every container host and client tools then connect to those to manage those containers or to move or copy them to another LXD.

We provide two such clients:

  • A command line tool called “lxc”
  • An OpenStack Nova plugin called nova-compute-lxd

The former is mostly aimed at small deployments ranging from a single machine (your laptop) to a few dozen hosts. The latter seamlessly integrates inside your OpenStack infrastructure and lets you manage containers exactly like you would virtual machines.

Why LXD?

LXC has been around for about 7 years now, it evolved from a set of very limited tools which would get you something only marginally better than a chroot, all the way to the stable set of tools, stable library and active user and development community that we have today.

Over those years, a lot of extra security features were added to the Linux kernel and LXC grew support for all of them. As we saw the need for people to build their own solution on top of LXC, we’ve developed a public API and a set of bindings. And last year, we’ve put out our first long term support release which has been a great success so far.

That being said, for a while now, we’ve been wanting to do a few big changes:

  • Make LXC secure by default (rather than it being optional).
  • Completely rework the tools to make them simpler and less confusing to newcomers.
  • Rely on container images rather than using “templates” to build them locally.
  • Proper checkpoint/restore support (live migration).

Unfortunately, solving any of those means doing very drastic changes to LXC which would likely break our existing users or at least force them to rethink the way they do things.

Instead, LXD is our opportunity to start fresh. We’re keeping LXC as the great low level container manager that it is. And build LXD on top of it, using LXC’s API to do all the low level work. That achieves the best of both worlds, we keep our low level container manager with its API and bindings but skip using its tools and templates, instead replacing those by the new experience that LXD provides.

How does LXD relate to LXC, Docker, Rocket and other container projects?

LXD is currently based on top of LXC. It uses the stable LXC API to do all the container management behind the scene, adding the REST API on top and providing a much simpler, more consistent user experience.

The focus of LXD is on system containers. That is, a container which runs a clean copy of a Linux distribution or a full appliance. From a design perspective, LXD doesn’t care about what’s running in the container.

That’s very different from Docker or Rocket which are application container managers (as opposed to system container managers) and so focus on distributing apps as containers and so very much care about what runs inside the container.

There is absolutely nothing wrong with using LXD to run a bunch of full containers which then run Docker or Rocket inside of them to run their different applications. So letting LXD manage the host resources for you, applying all the security restrictions to make the container safe and then using whatever application distribution mechanism you want inside.

Getting started with LXD

The simplest way for somebody to try LXD is by using it with its command line tool. This can easily be done on your laptop or desktop machine.

On an Ubuntu 15.04 system (or by using ppa:ubuntu-lxc/lxd-stable on 14.04 or above), you can install LXD with:

sudo apt-get install lxd

Then either logout and login again to get your group membership refreshed, or use:

newgrp lxd

From that point on, you can interact with your newly installed LXD daemon.

The “lxc” command line tool lets you interact with one or multiple LXD daemons. By default it will interact with the local daemon, but you can easily add more of them.

As an easy way to start experimenting with remote servers, you can add our public LXD server at https://images.linuxcontainers.org:8443
That server is an image-only read-only server, so all you can do with it is list images, copy images from it or start containers from it.

You’ll have to do the following to: add the server, list all of its images and then start a container from one of them:

lxc remote add images images.linuxcontainers.org
lxc image list images:
lxc launch images:ubuntu/trusty/i386 ubuntu-32

What the above does is define a new “remote” called “images” which points to images.linuxcontainers.org. Then list all of its images and finally start a local container called “ubuntu-32” from the ubuntu/trusty/i386 image. The image will automatically be cached locally so that future containers are started instantly.

The “<remote name>:” syntax is used throughout the lxc client. When not specified, the default “local” remote is assumed. Should you only care about managing a remote server, the default remote can be changed with “lxc remote set-default”.

Now that you have a running container, you can check its status and IP information with:

lxc list

Or get even more details with:

lxc info ubuntu-32

To get a shell inside the container, or to run any other command that you want, you may do:

lxc exec ubuntu-32 /bin/bash

And you can also directly pull or push files from/to the container with:

lxc file pull ubuntu-32/path/to/file .
lxc file push /path/to/file ubuntu-32/

When done, you can stop or delete your container with one of those:

lxc stop ubuntu-32
lxc delete ubuntu-32

What’s next?

The above should be a reasonably comprehensive guide to how to use LXD on a single system. Of course, that’s not the most interesting thing to do with LXD. All the commands shown above can work against multiple hosts, containers can be remotely created, moved around, copied, …

LXD also supports live migration, snapshots, configuration profiles, device pass-through and more.

I intend to write some more posts to cover those use cases and features as well as highlight some of the work we’re currently busy doing.

LXD is a pretty young but very active project. We’ve had great contributions from existing LXC developers as well as newcomers.

The project is entirely developed in the open at https://github.com/lxc/lxd. We keep track of upcoming features and improvements through the project’s issue tracker, so it’s easy to see what will be coming soon. We also have a set of issues marked “Easy” which are meant for new contributors as easy ways to get to know the LXD code and contribute to the project.

LXD is an Apache2 licensed project, written in Go and which doesn’t require a CLA to contribute to (we do however require the standard DCO Signed-off-by). It can be built with both golang and gccgo and so works on almost all architectures.

Extra resources

More information can be found on the official LXD website:
https://linuxcontainers.org/lxd

The code, issues and pull requests can all be found on Github:
https://github.com/lxc/lxd

And a good overview of the LXD design and its API may be found in our specs:
https://github.com/lxc/lxd/tree/master/specs

Conclusion

LXD is a new and exciting project. It’s an amazing opportunity to think fresh about system containers and provide the best user experience possible, alongside great features and rock solid security.

With 7 releases and close to a thousand commits by 20 contributors, it’s a very active, fast paced project. Lots of things still remain to be implemented before we get to our 1.0 milestone release in early 2016 but looking at what was achieved in just 5 months, I’m confident we’ll have an incredible LXD in another 12 months!

For now, we’d welcome your feedback, so install LXD, play around with it, file bugs and let us know what’s important for you next.

Read more
Dustin Kirkland

Earlier this week, here in Paris, at the OpenStack Design Summit, Mark Shuttleworth and Canonical introduced our vision and proof of concept for LXD.

You can find the official blog post on Canonical Insights, and a short video introduction on Youtube (by yours truly).

Our Canonical colleague Stephane Graber posted a bit more technical design detail here on the lxc-devel mailing list, which was picked up by HackerNews.  And LWN published a story yesterday covering another Canonical colleague of ours, Serge Hallyn, and his work on Cgroups and CGManager, all of which feeds into LXD.  As it happens, Stephane and Serge are upstream co-maintainers of Linux Containers.  Tycho Andersen, another colleague of ours, has been working on CRIU, which was the heart of his amazing demo this week, live migrating a container running the cult classic 1st person shooter, Doom! between two containers, back and forth.


Moreover, we've answered a few journalists' questions for excellent articles on ZDnet and SynergyMX.  Predictably, El Reg is skeptical (which isn't necessarily a bad thing).  But unfortunately, The Var Guy doesn't quite understand the technology (and unfortunately uses this article to conflate LXD with other random Canonical/Ubuntu complaints).

In any case, here's a bit more about LXD, in my own words...

Our primary design goal with LXD, is to extend containers into process based systems that behave like virtual machines.

We love KVM for its total machine abstraction, as a full virtualization hypervisor.  Moreover, we love what Docker does for application level development, confinement, packaging, and distribution.

But as an operating system and Linux distribution, our customers are, in fact, asking us for complete operating systems that boot and function within a Linux Container's execution space, natively.

Linux Containers are essential to our reference architecture of OpenStack, where we co-locate multiple services on each host.  Nearly every host is a Nova compute node, as well as a Ceph storage node, and also run a couple of units of "OpenStack overhead", such as MySQL, RabbitMQ, MongoDB, etc.  Rather than running each of those services all on the same physical system, we actually put each of them in their own container, with their own IP address, namespace, cgroup, etc.  This gives us tremendous flexibility, in the orchestration of those services.  We're able to move (migrate, even live migrate) those services from one host to another.  With that, it becomes possible to "evacuate" a given host, by moving each contained set of services elsewhere, perhaps a larger or smaller system, and then shut down the unit (perhaps to replace a hard drive or memory, or repurpose it entirely).

Containers also enable us to similarly confine services on virtual machines themselves!  Let that sink in for a second...  A contained workload is able, then, to move from one virtual machine to another, to a bare metal system.  Even from one public cloud provider, to another public or private cloud!

The last two paragraphs capture a few best practices that what we've learned over the last few years implementing OpenStack for some of the largest telcos and financial services companies in the world.  What we're hearing from Internet service and cloud providers is not too dissimilar...  These customers have their own customers who want cloud instances that perform at bare metal equivalence.  They also want to maximize the utilization of their server hardware, sometimes by more densely packing workloads on given systems.

As such, LXD is then a convergence of several different customer requirements, and our experience deploying some massively complex, scalable workloads (a la OpenStack, Hadoop, and others) in enterprises. 

The rapid evolution of a few key technologies under and around LXC have recently made this dream possible.  Namely: User namespaces, Cgroups, SECCOMP, AppArmorCRIU, as well as the library abstraction that our external tools use to manage these containers as systems.

LXD is a new "hypervisor" in that it provides (REST) APIs that can manage Linux Containers.  This is a step function beyond where we've been to date: able to start and stop containers with local commands and, to a limited extent, libvirt, but not much more.  "Booting" a system, in a container, running an init system, bringing up network devices (without nasty hacks in the container's root filesystem), etc. was challenging, but we've worked our way all of these, and Ubuntu boots unmodified in Linux Containers today.

Moreover, LXD is a whole new semantic for turning any machine -- Intel, AMD, ARM, POWER, physical, or even a virtual machine (e.g. your cloud instances) -- into a system that can host and manage and start and stop and import and export and migrate multiple collections of services bundled within containers.

I've received a number of questions about the "hardware assisted" containerization slide in my deck.  We're under confidentiality agreements with vendors as to the details and timelines for these features.

What (I think) I can say, is that there are hardware vendors who are rapidly extending some of the key features that have made cloud computing and virtualization practical, toward the exciting new world of Linux Containers.  Perhaps you might read a bit about CPU VT extensions, No Execute Bits, and similar hardware security technologies.  Use your imagination a bit, and you can probably converge on a few key concepts that will significantly extend the usefulness of Linux Containers.

As soon as such hardware technology is enabled in Linux, you have our commitment that Ubuntu will bring those features to end users faster than anyone else!

If you want to play with it today, you can certainly see the primitives within Ubuntu's LXC.  Launch Ubuntu containers within LXC and you'll start to get the general, low level idea.  If you want to view it from one layer above, give our new nova-compute-flex (flex was the code name, before it was released as LXD), a try.  It's publicly available as a tech preview in Ubuntu OpenStack Juno (authored by Chuck Short, Scott Moser, and James Page).  Here, you can launch OpenStack instances as LXC containers (rather than KVM virtual machines), as "general purpose" system instances.

Finally, perhaps lost in all of the activity here, is a couple of things we're doing different for the LXD project.  We at Canonical have taken our share of criticism over the years about choice of code hosting (our own Bazaar and Launchpad.net), our preferred free software licence (GPLv3/AGPLv3), and our contributor license agreement (Canonical CLA).   [For the record: I love bzr/Launchpad, prefer GPL/AGPL, and am mostly ambivalent on the CLA; but I won't argue those points here.]
  1. This is a public, community project under LinuxContainers.org
  2. The code and design documents are hosted on Github
  3. Under an Apache License
  4. Without requiring signatures of the Canonical CLA
These have been very deliberate, conscious decisions, lobbied for and won by our engineers leading the project, in the interest of collaborating and garnering the participation of communities that have traditionally shunned Canonical-led projects, raising the above objections.  I, for one, am eager to see contribution and collaboration that too often, we don't see.

Cheers!
:-Dustin

Read more
Dustin Kirkland

Say it with me, out loud.  Lex.  See.  Lex-see.  LXC.

Now, change the "see" to a "dee".  Lex.  Dee.  Lex-dee.  LXD.

Easy!

Earlier this week, here in Paris, at the OpenStack Design Summit, Mark Shuttleworth and Canonical introduced our vision and proof of concept for LXD.

You can find the official blog post on Canonical Insights, and a short video introduction on Youtube (by yours truly).

Our Canonical colleague Stephane Graber posted a bit more technical design detail here on the lxc-devel mailing list, which was picked up by HackerNews.  And LWN published a story yesterday covering another Canonical colleague of ours, Serge Hallyn, and his work on Cgroups and CGManager, all of which feeds into LXD.  As it happens, Stephane and Serge are upstream co-maintainers of Linux Containers.  Tycho Andersen, another colleague of ours, has been working on CRIU, which was the heart of his amazing demo this week, live migrating a container running the cult classic 1st person shooter, Doom! between two containers, back and forth.



Moreover, we've answered a few journalists' questions for excellent articles on ZDnet and SynergyMX.  Predictably, El Reg is skeptical (which isn't necessarily a bad thing).  But unfortunately, The Var Guy doesn't quite understand the technology (and unfortunately uses this article to conflate LXD with other random Canonical/Ubuntu complaints).

In any case, here's a bit more about LXD, in my own words...

Our primary design goal with LXD, is to extend containers into process based systems that behave like virtual machines.

We love KVM for its total machine abstraction, as a full virtualization hypervisor.  Moreover, we love what Docker does for application level development, confinement, packaging, and distribution.

But as an operating system and Linux distribution, our customers are, in fact, asking us for complete operating systems that boot and function within a Linux Container's execution space, natively.

Linux Containers are essential to our reference architecture of OpenStack, where we co-locate multiple services on each host.  Nearly every host is a Nova compute node, as well as a Ceph storage node, and also run a couple of units of "OpenStack overhead", such as MySQL, RabbitMQ, MongoDB, etc.  Rather than running each of those services all on the same physical system, we actually put each of them in their own container, with their own IP address, namespace, cgroup, etc.  This gives us tremendous flexibility, in the orchestration of those services.  We're able to move (migrate, even live migrate) those services from one host to another.  With that, it becomes possible to "evacuate" a given host, by moving each contained set of services elsewhere, perhaps a larger or smaller system, and then shut down the unit (perhaps to replace a hard drive or memory, or repurpose it entirely).

Containers also enable us to similarly confine services on virtual machines themselves!  Let that sink in for a second...  A contained workload is able, then, to move from one virtual machine to another, to a bare metal system.  Even from one public cloud provider, to another public or private cloud!

The last two paragraphs capture a few best practices that what we've learned over the last few years implementing OpenStack for some of the largest telcos and financial services companies in the world.  What we're hearing from Internet service and cloud providers is not too dissimilar...  These customers have their own customers who want cloud instances that perform at bare metal equivalence.  They also want to maximize the utilization of their server hardware, sometimes by more densely packing workloads on given systems.

As such, LXD is then a convergence of several different customer requirements, and our experience deploying some massively complex, scalable workloads (a la OpenStack, Hadoop, and others) in enterprises. 

The rapid evolution of a few key technologies under and around LXC have recently made this dream possible.  Namely: User namespaces, Cgroups, SECCOMP, AppArmorCRIU, as well as the library abstraction that our external tools use to manage these containers as systems.

LXD is a new "hypervisor" in that it provides (REST) APIs that can manage Linux Containers.  This is a step function beyond where we've been to date: able to start and stop containers with local commands and, to a limited extent, libvirt, but not much more.  "Booting" a system, in a container, running an init system, bringing up network devices (without nasty hacks in the container's root filesystem), etc. was challenging, but we've worked our way all of these, and Ubuntu boots unmodified in Linux Containers today.

Moreover, LXD is a whole new semantic for turning any machine -- Intel, AMD, ARM, POWER, physical, or even a virtual machine (e.g. your cloud instances) -- into a system that can host and manage and start and stop and import and export and migrate multiple collections of services bundled within containers.

I've received a number of questions about the "hardware assisted" containerization slide in my deck.  We're under confidentiality agreements with vendors as to the details and timelines for these features.

What (I think) I can say, is that there are hardware vendors who are rapidly extending some of the key features that have made cloud computing and virtualization practical, toward the exciting new world of Linux Containers.  Perhaps you might read a bit about CPU VT extensions, No Execute Bits, and similar hardware security technologies.  Use your imagination a bit, and you can probably converge on a few key concepts that will significantly extend the usefulness of Linux Containers.

As soon as such hardware technology is enabled in Linux, you have our commitment that Ubuntu will bring those features to end users faster than anyone else!

If you want to play with it today, you can certainly see the primitives within Ubuntu's LXC.  Launch Ubuntu containers within LXC and you'll start to get the general, low level idea.  If you want to view it from one layer above, give our new nova-compute-flex (flex was the code name, before it was released as LXD), a try.  It's publicly available as a tech preview in Ubuntu OpenStack Juno (authored by Chuck Short, Scott Moser, and James Page).  Here, you can launch OpenStack instances as LXC containers (rather than KVM virtual machines), as "general purpose" system instances.

Finally, perhaps lost in all of the activity here, is a couple of things we're doing different for the LXD project.  We at Canonical have taken our share of criticism over the years about choice of code hosting (our own Bazaar and Launchpad.net), our preferred free software licence (GPLv3/AGPLv3), and our contributor license agreement (Canonical CLA).   [For the record: I love bzr/Launchpad, prefer GPL/AGPL, and am mostly ambivalent on the CLA; but I won't argue those points here.]
  1. This is a public, community project under LinuxContainers.org
  2. The code and design documents are hosted on Github
  3. Under an Apache License
  4. Without requiring signatures of the Canonical CLA
These have been very deliberate, conscious decisions, lobbied for and won by our engineers leading the project, in the interest of collaborating and garnering the participation of communities that have traditionally shunned Canonical-led projects, raising the above objections.  I, for one, am eager to see contribution and collaboration that too often, we don't see.

Cheers!
:-Dustin

Read more
pitti

I’m on my way home from Düsseldorf where I attended the LinuxCon Europe and Linux Plumber conferences. I was quite surprised how huge LinuxCon was, there were about 1.500 people there! Certainly much more than last year in New Orleans.

Containers (in both LXC and docker flavors) are the Big Thing everybody talks about and works with these days; there was hardly a presentation where these weren’t mentioned at all, and (what felt like) half of the presentations were either how to improve these, or how to use these technologies to solve problems. For example, some people/companies really take LXC to the max and try to do everything in them including tasks which in the past you had only considered full VMs for, like untrusted third-party tenants. For example there was an interesting talk how to secure networking for containers, and pretty much everyone uses docker or LXC now to deploy workloads, run CI tests. There are projects like “fleet” which manage systemd jobs across an entire cluster of containers (distributed task scheduler) or like project-builder.org which auto-build packages from each commit of projects.

Another common topic is the trend towards building/shipping complete (r/o) system images, atomic updates and all that goodness. The central thing here was certainly “Stateless systems, factory reset, and golden images” which analyzed the common requirements and proposed how to implement this with various package systems and scenarios. In my opinion this is certainly the way to go, as our current solution on Ubuntu Touch (i. e. Ubuntu’s system-image) is far too limited and static yet, it doesn’t extend to desktops/servers/cloud workloads at all. It’s also a lot of work to implement this properly, so it’s certainly understandable that we took that shortcut for prototyping and the relatively limited Touch phone environment.

On Plumbers my main occupations were mostly the highly interesting LXC track to see what’s coming in the container world, and the systemd hackfest. On the latter I was again mostly listening (after all, I’m still learning most of the internals there..) and was able to work on some cleanups and improvements like getting rid of some of Debian’s patches and properly run the test suite. It was also great to sync up again with David Zeuthen about the future of udisks and some particular proposed new features. Looks like I’m the de-facto maintainer now, so I’ll need to spend some time soon to review/include/clean up some much requested little features and some fixes.

All in all a great week to meet some fellows of the FOSS world a gain, getting to know a lot of new interesting people and projects, and re-learning to drink beer in the evening (I hardly drink any at home :-P).

If you are interested you can also see my raw notes, but beware that there are mostly just scribbling.

Now, off to next week’s Canonical meeting in Washington, DC!

Read more
Stéphane Graber

I often have to deal with VPNs, either to connect to the company network, my own network when I’m abroad or to various other places where I’ve got servers I manage.

All of those VPNs use OpenVPN, all with a similar configuration and unfortunately quite a lot of them with overlapping networks. That means that when I connect to them, parts of my own network are no longer reachable or it means that I can’t connect to more than one of them at once.

Those I suspect are all pretty common issues with VPN users, especially those working with or for companies who over the years ended up using most of the rfc1918 subnets.

So I thought, I’m working with containers every day, nowadays we have those cool namespaces in the kernel which let you run crazy things as a a regular user, including getting your own, empty network stack, so why not use that?

Well, that’s what I ended up doing and so far, that’s all done in less than 100 lines of good old POSIX shell script :)

That gives me, fully unprivileged non-overlapping VPNs! OpenVPN and everything else run as my own user and nobody other than the user spawning the container can possibly get access to the resources behind the VPN.

The code is available at: git clone git://github.com/stgraber/vpn-container

Then it’s as simple as: ./start-vpn VPN-NAME CONFIG

What happens next is the script will call socat to proxy the VPN TCP socket to a UNIX socket, then a user namespace, network namespace, mount namespace and uts namespace are all created for the container. Your user is root in that namespace and so can start openvpn and create network interfaces and routes. With careful use of some bind-mounts, resolvconf and byobu are also made to work so DNS resolution is functional and we can start byobu to easily allow as many shell as you want in there.

In the end it looks like this:

stgraber@dakara:~/vpn$ ./start-vpn stgraber.net ../stgraber-vpn/stgraber.conf 
WARN: could not reopen tty: No such file or directory
lxc: call to cgmanager_move_pid_abs_sync(name=systemd) failed: invalid request
Fri Sep 26 17:48:07 2014 OpenVPN 2.3.2 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [eurephia] [MH] [IPv6] built on Feb  4 2014
Fri Sep 26 17:48:07 2014 WARNING: No server certificate verification method has been enabled.  See http://openvpn.net/howto.html#mitm for more info.
Fri Sep 26 17:48:07 2014 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Fri Sep 26 17:48:07 2014 Attempting to establish TCP connection with [AF_INET]127.0.0.1:1194 [nonblock]
Fri Sep 26 17:48:07 2014 TCP connection established with [AF_INET]127.0.0.1:1194
Fri Sep 26 17:48:07 2014 TCPv4_CLIENT link local: [undef]
Fri Sep 26 17:48:07 2014 TCPv4_CLIENT link remote: [AF_INET]127.0.0.1:1194
Fri Sep 26 17:48:09 2014 [vorash.stgraber.org] Peer Connection Initiated with [AF_INET]127.0.0.1:1194
Fri Sep 26 17:48:12 2014 TUN/TAP device tun0 opened
Fri Sep 26 17:48:12 2014 Note: Cannot set tx queue length on tun0: Operation not permitted (errno=1)
Fri Sep 26 17:48:12 2014 do_ifconfig, tt->ipv6=1, tt->did_ifconfig_ipv6_setup=1
Fri Sep 26 17:48:12 2014 /sbin/ip link set dev tun0 up mtu 1500
Fri Sep 26 17:48:12 2014 /sbin/ip addr add dev tun0 172.16.35.50/24 broadcast 172.16.35.255
Fri Sep 26 17:48:12 2014 /sbin/ip -6 addr add 2001:470:b368:1035::50/64 dev tun0
Fri Sep 26 17:48:12 2014 /etc/openvpn/update-resolv-conf tun0 1500 1544 172.16.35.50 255.255.255.0 init
dhcp-option DNS 172.16.20.30
dhcp-option DNS 172.16.20.31
dhcp-option DNS 2001:470:b368:1020:216:3eff:fe24:5827
dhcp-option DNS nameserver
dhcp-option DOMAIN stgraber.net
Fri Sep 26 17:48:12 2014 add_route_ipv6(2607:f2c0:f00f:2700::/56 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:714b::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:b368::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:b511::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 add_route_ipv6(2001:470:b512::/48 -> 2001:470:b368:1035::1 metric -1) dev tun0
Fri Sep 26 17:48:12 2014 Initialization Sequence Completed


To attach to this VPN, use: byobu -S /home/stgraber/vpn/stgraber.net.byobu
To kill this VPN, do: byobu -S /home/stgraber/vpn/stgraber.net.byobu kill-server
or from inside byobu: byobu kill-server

After that, just copy/paste the byobu command and you’ll get a shell inside the container. Don’t be alarmed by the fact that you’re root in there. root is mapped to your user’s uid and gid outside the container so it’s actually just your usual user but with a different name and with privileges against the resources owned by the container.

You can now use the VPN as you want without any possible overlap or conflict with any route or VPN you may be running on that system and with absolutely no possibility that a user sharing your machine may access your running VPN.

This has so far been tested with 5 different VPNs, on a regular Ubuntu 14.04 LTS system with all VPNs being TCP based. UDP based VPNs would probably just need a couple of tweaks to the socat unix-socket proxy.

Enjoy!

Read more
Dustin Kirkland


Docker 1.0.1 is available for testing, in Ubuntu 14.04 LTS!

Docker 1.0.1 has landed in the trusty-proposed archive, which we hope to SRU to trusty-updates very soon.  We would love to have your testing feedback, to ensure both upgrades from Docker 0.9.1, as well as new installs of Docker 1.0.1 behave well, and are of the highest quality you have come to expect from Ubuntu's LTS  (Long Term Stable) releases!  Please file any bugs or issues here.

Moreover, this new version of the Docker package now installs the Docker binary to /usr/bin/docker, rather than /usr/bin/docker.io in previous versions. This should help Ubuntu's Docker package more closely match the wealth of documentation and examples available from our friends upstream.

A big thanks to Paul Tagliamonte, James Page, Nick Stinemates, Tianon Gravi, and Ryan Harper for their help upstream in Debian and in Ubuntu to get this package updated in Trusty!  Also, it's probably worth mentioning that we're targeting Docker 1.1.2 (or perhaps 1.2.0) for Ubuntu 14.10 (Utopic), which will release on October 23, 2014.

Here are a few commands that might help your testing...

Check What Candidate Versions are Available

$ sudo apt-get update
$ apt-cache show docker.io | grep ^Version:

If that shows 0.9.1~dfsg1-2 (as it should), then you need to enable the trusty-proposed pocket.

$ echo "deb http://archive.ubuntu.com/ubuntu/ trusty-proposed universe" | sudo tee -a /etc/apt/sources.list
$ sudo apt-get update
$ apt-cache show docker.io | grep ^Version:

And now you should see the new version, 1.0.1~dfsg1-0ubuntu1~ubuntu0.14.04.1, available (probably in addition to 1.0.1~dfsg1-0ubuntu1~ubuntu0.14.04.1).

Upgrades

Check if you already have Docker installed, using:

$ dpkg -l docker.io

If so, you can simply upgrade.

$ sudo apt-get upgrade

And now, you can check your Docker version:

$ sudo dpkg -l docker.io | grep -m1 ^ii | awk '{print $3}'
0.9.1~dfsg1-2

New Installations

You can simply install the new package with:

$ sudo apt-get install docker.io

And ensure that you're on the latest version with:

$ dpkg -l docker.io | grep -m1 ^ii | awk '{print $3}'
1.0.1~dfsg1-0ubuntu1~ubuntu0.14.04.1

Running Docker

If you're already a Docker user, you probably don't need these instructions.  But in case you're reading this, and trying Docker for the first time, here's the briefest of quick start guides :-)

$ sudo docker pull ubuntu
$ sudo docker run -i -t ubuntu /bin/bash

And now you're running a bash shell inside of an Ubuntu Docker container.  And only bash!

root@1728ffd1d47b:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 13:42 ? 00:00:00 /bin/bash
root 8 1 0 13:43 ? 00:00:00 ps -ef

If you want to do something more interesting in Docker, well, that's whole other post ;-)

:-Dustin

Read more
Dustin Kirkland





This article is cross-posted on Docker's blog as well.

There is a design pattern, occasionally found in nature, when some of the most elegant and impressive solutions often seem so intuitive, in retrospect.



For me, Docker is just that sort of game changing, hyper-innovative technology, that, at its core,  somehow seems straightforward, beautiful, and obvious.



Linux containers, repositories of popular base images, snapshots using modern copy-on-write filesystem features.  Brilliant, yet so simple.  Docker.io for the win!


I clearly recall nine long months ago, intrigued by a fervor of HackerNews excitement pulsing around a nascent Docker technology.  I followed a set of instructions on a very well designed and tastefully manicured web page, in order to launch my first Docker container.  Something like: start with Ubuntu 13.04, downgrade the kernel, reboot, add an out-of-band package repository, install an oddly named package, import some images, perhaps debug or ignore some errors, and then launch.  In few moments, I could clearly see the beginnings of a brave new world of lightning fast, cleanly managed, incrementally saved, highly dense, operating system containers.

Ubuntu inside of Ubuntu, Inception style.  So.  Much.  Potential.



Fast forward to today -- April 18, 2014 -- and the combination of Docker and Ubuntu 14.04 LTS has raised the bar, introducing a new echelon of usability and convenience, and coupled with the trust and track record of enterprise grade Long Term Support from Canonical and the Ubuntu community.
Big thanks, by the way, to Paul Tagliamonte, upstream Debian packager of Docker.io, as well as all of the early testers and users of Docker during the Ubuntu development cycle.
Docker is now officially in Ubuntu.  That makes Ubuntu 14.04 LTS the first enterprise grade Linux distribution to ship with Docker natively packaged, continuously tested, and instantly installable.  Millions of Ubuntu servers are now never more than three commands away from launching or managing Linux container sandboxes, thanks to Docker.


sudo apt-get install docker.io
sudo docker.io pull ubuntu
sudo docker.io run -i -t ubuntu /bin/bash


And after that last command, Ubuntu is now running within Docker, inside of a Linux container.

Brilliant.

Simple.

Elegant.

User friendly.

Just the way we've been doing things in Ubuntu for nearly a decade. Thanks to our friends at Docker.io!


Cheers,
:-Dustin

Read more
Stéphane Graber

After 10 months of work, over a thousand contributions by 60 or so contributors, we’ve finally released LXC 1.0!

You may have followed my earlier series of blog post on LXC 1.0, well, everything I described in there is now available in a stable release which we intend to support for a long time.

In the immediate future, I expect most of LXC upstream will focus on dealing with the bug reports and questions which will no doubt follow this release, then we’ll have to discuss what our goals for LXC 1.1 are and setup a longer term roadmap to LXC 2.0.

But right now, I’m just happy to have LXC 1.0 out, get a lot more users to play with new technologies like unprivileged containers and play with our API in the various languages we support.

Thanks to everyone who made this possible!

Read more
Stéphane Graber

This is post 10 out of 10 in the LXC 1.0 blog post series.

Logging

Most LXC commands take two options:

  • -o, –logfile=FILE: Location of the logfile (defaults to stder)
  • -l, –logpriority=LEVEL: Log priority (defaults to ERROR)

The valid log priorities are:

  • FATAL
  • ALERT
  • CRIT
  • ERROR
  • WARN
  • NOTICE
  • INFO
  • DEBUG
  • TRACE

FATAL, ALERT and CRIT are mostly unused at this time, ERROR is pretty common and so are the others except for TRACE. If you want to see all possible log entries, set the log priority to TRACE.

There are also two matching configuration options which you can put in your container’s configuration:

  • lxc.logfile
  • lxc.loglevel

They behave exactly like their command like counterparts. However note that if the command line options are passed, any value set in the configuration will be ignored and instead will be overridden by those passed by the user.

When reporting a bug against LXC, it’s usually a good idea to attach a log of the container’s action with a logpriority of at least DEBUG.

API debugging

When debugging a problem using the API it’s often a good idea to try and re-implement the failing bit of code in C using liblxc directly, that helps get the binding out of the way and usually leads to cleaner stack traces and easier bug reports.

It’s also useful to set lxc.loglevel to DEBUG using set_config_item on your container so you can get a log of what LXC is doing.

Testing

Before digging to deep into an issue with the code you are working on, it’s usually a good idea to make sure that LXC itself is behaving as it should on your machine.

To check that, run “lxc-checkconfig” and look for any missing kernel feature, if all looks good, then install (or build) the tests. In Ubuntu, those are shipped in a separate “lxc-tests” package. Most of those tests are expecting to be run on an Ubuntu system (patches welcome…) but should do fine on any distro that’s compatible with the lxc-ubuntu template.

Run each of the lxc-test-* binaries as root and note any failure. Note that it’s possible that they leave some cruft behind on failure, if so, please cleanup any of those leftover containers before processing to the next test as unfortunately that cruft may cause failure by itself.

Reporting bugs

The primary LXC bug tracker is available at: https://github.com/lxc/lxc/issues

You may also report bugs directly through the distributions (though it’s often preferred to still file an upstream bug and then link the two), for example for Ubuntu, LXC bugs are tracked here: https://bugs.launchpad.net/ubuntu/+source/lxc

If you’ve already done some work tracking down the bug, you may also directly contact us on our mailing-list (see below).

Sending patches

We always welcome contributions and are very happy to have such an active development community around LXC (Over 60 people contributed to LXC 1.0). We don’t have many rules governing contributions, we just ask that your contributions be properly licensed and that you own the copyright on the code you are sending us (and indicate so by putting a Signed-off-by line in your commit).

As for the licensing, anything which ends up in the library (liblxc) or its bindings must be LGPLv2.1+ or compatible with it and not adding any additional restriction. Standalone binaries and scripts can either be LGLPv2.1+ (the project default) or GPLv2. If unsure, LGPLv2.1+ is usually a safe bet for any new file in LXC.

Patches may be sent using two different ways:

  • Inline to the lxc-devel@lists.linuxcontainers.org (using git send-email or similar)
  • Using a pull request on github (we will then grab the .patch URL and treat it as if they were e-mails sent to our list)

Getting in touch with us

The primary way of contacting the upstream LXC team is through our mailing-lists. We have two, one for LXC development and one for LXC users questions:

For more real-time discussion, you can also find a lot of LXC users and most of the developers in #lxcontainers on irc.freenode.net.

Final notes

So this is my final blog post before LXC 1.0 is finally released. We’re currently at rc3 with an rc4 coming a bit later today and a final release scheduled for tomorrow evening or Thursday morning.

I hope you have enjoyed this blog post series and that it’ll be a useful reference for people deploying LXC 1.0.

Read more
Stéphane Graber

This is post 9 out of 10 in the LXC 1.0 blog post series.

Some starting notes

This post uses unprivileged containers, this isn’t an hard requirement but makes a lot of sense for GUI applications. Besides, since you followed the whole series, you have those setup anyway, right?

I’ll be using Google Chrome with the Google Talk and Adobe Flash plugins as “hostile” piece of software that I do not wish to allow to run directly on my machine.
Here are a few reasons why:

  • Those are binaries I don’t have source for.
    (That alone is usually enough for me to put an application in a container)
  • Comes from an external (non-Ubuntu) repository which may then push anything they wish to my system.
    (Could be restricted with careful apt pining)
  • Installs a daily cronjob which will re-add said repository and GPG keys should I for some reason choose to remove them.
    (Apparently done to have the repository re-added after dist-upgrades)
  • Uses a setuid wrapper to setup their sandbox. Understandable as they switch namespaces and user namespaces aren’t widespread yet.
    (This can be worked around by turning off the sandbox. The code for the sandbox is also available from the chromium project, though there’s no proof that the binary build doesn’t have anything added to it)
  • I actually need to use those piece of software because of my work environment and because the web hasn’t entirely moved away from flash yet…

While what I’ll be describing below is a huge step up for security in my opinion, it’s still not ideal and a few compromises have to be made to make those software even work. Those are basically access to:

  • pulseaudio: possibly recording you
  • X11: possibly doing key logging or taking pictures of your screen
  • dri/snd devices: can’t think of anything that’s not already covered by the first two, but I’m sure someone with a better imagination than mine will come up with something

But there’s only so much you can do while still having the cool features, so the best you can do is make sure never to run the container while doing sensitive work.

Running Google chrome in a container

So now to the actual fun. The plan is rather simple, I want a simple container, running a stable, well supported, version of Ubuntu, in there I’ll install Google Chrome and any plugin I need, then integrate it with my desktop.

First of all, let’s get ourselves an Ubuntu 12.04 i386 container as that’s pretty well supported by most ISVs:

lxc-create -t download -n precise-gui -- -d ubuntu -r precise -a i386

Let’s tweak the configuration a bit by adding the following to ~/.local/share/lxc/precise-gui/config (replacing USERNAME appropriately):

lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file

lxc.hook.pre-start = /home/USERNAME/.local/share/lxc/precise-gui/setup-pulse.sh

Still in that same config file, you’ll have to replace your existing (or similar):

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

By something like (this assume your user’s uid/gid is 1000/1000):

lxc.id_map = u 0 100000 1000
lxc.id_map = g 0 100000 1000
lxc.id_map = u 1000 1000 1
lxc.id_map = g 1000 1000 1
lxc.id_map = u 1001 101001 64535
lxc.id_map = g 1001 101001 64535

So those mappings basically mean that your container has 65536 uids and gids mapped to it, starting at 0 up to 65535 in the container. Those are mapped to host ids 100000 to 165535 with one exception, uid and gid 1000 isn’t translated. That trick is needed so that your user in the container can access the X socket, pulseaudio socket and DRI/snd devices just as your own user can (this saves us a whole lot of configuration on the host).

Now that we’re done with the config file, let’s create that setup-pulse.sh script:

#!/bin/sh
PULSE_PATH=$LXC_ROOTFS_PATH/home/ubuntu/.pulse_socket

if [ ! -e "$PULSE_PATH" ] || [ -z "$(lsof -n $PULSE_PATH 2>&1)" ]; then
    pactl load-module module-native-protocol-unix auth-anonymous=1 \
        socket=$PULSE_PATH
fi

Make sure the file is executable or LXC will ignore it!
That script is fairly simple, it’ll simply tell pulseaudio on the host to bind /home/ubuntu/.pulse_socket in the container, checking that it’s not already setup.

Finally, run the following to fix the permissions of your container’s home directory:

sudo chown -R 1000:1000 ~/.local/share/lxc/precise-gui/rootfs/home/ubuntu

And that’s all that’s needed in the LXC side. So let’s start the container and install Google Chrome and the Google Talk plugin in there:

lxc-start -n precise-gui -d
lxc-attach -n precise-gui -- umount /tmp/.X11-unix
lxc-attach -n precise-gui -- apt-get update
lxc-attach -n precise-gui -- apt-get dist-upgrade -y
lxc-attach -n precise-gui -- apt-get install wget ubuntu-artwork dmz-cursor-theme ca-certificates pulseaudio -y
lxc-attach -n precise-gui -- wget https://dl.google.com/linux/direct/google-chrome-stable_current_i386.deb -O /tmp/chrome.deb
lxc-attach -n precise-gui -- wget https://dl.google.com/linux/direct/google-talkplugin_current_i386.deb -O /tmp/talk.deb
lxc-attach -n precise-gui -- dpkg -i /tmp/chrome.deb /tmp/talk.deb
lxc-attach -n precise-gui -- apt-get -f install -y
lxc-attach -n precise-gui -- sudo -u ubuntu mkdir -p /home/ubuntu/.pulse/
echo "disable-shm=yes" | lxc-attach -n precise-gui -- sudo -u ubuntu tee /home/ubuntu/.pulse/client.conf
lxc-stop -n precise-gui

At this point, everything you need is installed in the container.
To make your life easier, create the following launcher script, let’s call it “start-chrome” and put it in the container’s configuration directory (next to config and setup-pulse.sh):

#!/bin/sh
CONTAINER=precise-gui
CMD_LINE="google-chrome --disable-setuid-sandbox $*"

STARTED=false

if ! lxc-wait -n $CONTAINER -s RUNNING -t 0; then
    lxc-start -n $CONTAINER -d
    lxc-wait -n $CONTAINER -s RUNNING
    STARTED=true
fi

PULSE_SOCKET=/home/ubuntu/.pulse_socket

lxc-attach --clear-env -n $CONTAINER -- sudo -u ubuntu -i \
    env DISPLAY=$DISPLAY PULSE_SERVER=$PULSE_SOCKET $CMD_LINE

if [ "$STARTED" = "true" ]; then
    lxc-stop -n $CONTAINER -t 10
fi

Make sure the script is executable or the next step won’t work. This script will check if the container is running, if not, start it (and remember it did), then spawn google-chrome with the right environment set (and disabling its built-in sandbox as for some obscure reasons, it dislikes user namespaces), once google-chrome exits, the container is stopped.

To make things shinier, you may now also create ~/.local/share/applications/google-chrome.desktop containing:

[Desktop Entry]
Version=1.0
Name=Google Chrome
Comment=Access the Internet
Exec=/home/USERNAME/.local/share/lxc/precise-gui/start-chrome %U
Icon=/home/USERNAME/.local/share/lxc/precise-gui/rootfs/opt/google/chrome/product_logo_256.png
Type=Application
Categories=Network;WebBrowser;

Don’t forget to replace USERNAME to your own username so that both paths are valid.

And that’s it! You should now find a Google Chrome icon somewhere in your desktop environment (menu, dash, whatever…). Clicking on it will start Chrome which can be used pretty much as usual, when closed, the container will shutdown.
You may want to setup extra symlinks or bind-mount to make it easier to access things like the Downloads folder but that really depends on what you’re using the container for.

Chrome running in LXC

Obviously, the same process can be used for many different piece of software.

Skype

Quite a few people have contacted me asking about running Skype in that same container. I won’t give you a whole step by step guide as the one for Chrome cover 99% of what you need to do for Skype.

However there are two tricks you need to be aware of to get Skype to work properly:

  • Set “QT_X11_NO_MITSHM” to “1”
    (otherwise you get a blank window as it tries to use shared memory)
  • Set “GNOME_DESKTOP_SESSION_ID” to “this-is-deprecated”
    (otherwise you get an ugly Qt theme)

Those two should be added after the “env” in the launcher script you’ll write for Skype.

Apparently on some NVidia system, you may also need to set an additional environment variable (possibly useful not only for Skype):
LD_PRELOAD=/usr/lib/i386-linux-gnu/mesa/libGL.so.1

Steam

And finally, yet another commonly asked one, Steam.

That one actually doesn’t require anything extra in its environment, just grab the .deb, install it in the container, run an “apt-get -f install” to install any remaining dependency, create a launcher script and .desktop and you’re done.
I’ve been happily playing a few games (thanks to Valve giving those to all Ubuntu and Debian developers) without any problem so far.

Read more
Stéphane Graber

This is post 8 out of 10 in the LXC 1.0 blog post series.

The API

The first version of liblxc was introduced in LXC 0.9 but it was very much at an experimental state. LXC 1.0 however will ship with a much more complete API, covering all of LXC’s features. We’ve actually been rebasing all of our tools (lxc-*) to using that API rather than doing direct calls to the internal functions.

The API also comes with a whole set of tests which we run as part of our continuous integration setup and before distro uploads.

There are also quite a few bindings for those who don’t feel like writing C, we have lua and python3 bindings in-tree upstream and there are official out-of-tree bindings for Go and ruby.

The API documentation can be found at:
https://qa.linuxcontainers.org/master/current/doc/api/

It’s not necessarily the most readable API documentation ever and certainly could do with some examples, especially for the bindings, but it does cover all functions that are exported over the API. Any help improving our API documentation is very much welcome!

The basics

So let’s start with a very simple example of the LXC API using C, the following example will create a new container struct called “apicontainer”, create a root filesystem using the new download template, start the container, print its state and PID number, then attempt a clean shutdown before killing it.

#include <stdio.h>

#include <lxc/lxccontainer.h>

int main() {
    struct lxc_container *c;
    int ret = 1;

    /* Setup container struct */
    c = lxc_container_new("apicontainer", NULL);
    if (!c) {
        fprintf(stderr, "Failed to setup lxc_container struct\n");
        goto out;
    }

    if (c->is_defined(c)) {
        fprintf(stderr, "Container already exists\n");
        goto out;
    }

    /* Create the container */
    if (!c->createl(c, "download", NULL, NULL, LXC_CREATE_QUIET,
                    "-d", "ubuntu", "-r", "trusty", "-a", "i386", NULL)) {
        fprintf(stderr, "Failed to create container rootfs\n");
        goto out;
    }

    /* Start the container */
    if (!c->start(c, 0, NULL)) {
        fprintf(stderr, "Failed to start the container\n");
        goto out;
    }

    /* Query some information */
    printf("Container state: %s\n", c->state(c));
    printf("Container PID: %d\n", c->init_pid(c));

    /* Stop the container */
    if (!c->shutdown(c, 30)) {
        printf("Failed to cleanly shutdown the container, forcing.\n");
        if (!c->stop(c)) {
            fprintf(stderr, "Failed to kill the container.\n");
            goto out;
        }
    }

    /* Destroy the container */
    if (!c->destroy(c)) {
        fprintf(stderr, "Failed to destroy the container.\n");
        goto out;
    }

    ret = 0;
out:
    lxc_container_put(c);
    return ret;
}

So as you can see, it’s not very difficult to use, most functions are fairly straightforward and error checking is pretty simple (most calls are boolean and errors are printed to stderr by LXC depending on the loglevel).

Python3 scripting

As much fun as C may be, I usually like to script my containers and C isn’t really the best language for that. That’s why I wrote and maintain the official python3 binding.

The equivalent to the example above in python3 would be:

import lxc
import sys

# Setup the container object
c = lxc.Container("apicontainer")
if c.defined:
    print("Container already exists", file=sys.stderr)
    sys.exit(1)

# Create the container rootfs
if not c.create("download", lxc.LXC_CREATE_QUIET, {"dist": "ubuntu",
                                                   "release": "trusty",
                                                   "arch": "i386"}):
    print("Failed to create the container rootfs", file=sys.stderr)
    sys.exit(1)

# Start the container
if not c.start():
    print("Failed to start the container", file=sys.stderr)
    sys.exit(1)

# Query some information
print("Container state: %s" % c.state)
print("Container PID: %s" % c.init_pid)

# Stop the container
if not c.shutdown(30):
    print("Failed to cleanly shutdown the container, forcing.")
    if not c.stop():
        print("Failed to kill the container", file=sys.stderr)
        sys.exit(1)

# Destroy the container
if not c.destroy():
    print("Failed to destroy the container.", file=sys.stderr)
    sys.exit(1)

Now for that specific example, python3 isn’t that much simpler than the C equivalent.

But what if we wanted to do something slightly more tricky, like iterating through all existing containers, start them (if they’re not already started), wait for them to have network connectivity, then run updates and shut them down?

import lxc
import sys

for container in lxc.list_containers(as_object=True):
    # Start the container (if not started)
    started=False
    if not container.running:
        if not container.start():
            continue
        started=True

    if not container.state == "RUNNING":
        continue

    # Wait for connectivity
    if not container.get_ips(timeout=30):
        continue

    # Run the updates
    container.attach_wait(lxc.attach_run_command,
                          ["apt-get", "update"])
    container.attach_wait(lxc.attach_run_command,
                          ["apt-get", "dist-upgrade", "-y"])

    # Shutdown the container
    if started:
        if not container.shutdown(30):
            container.stop()

The most interesting bit in the example above is the attach_wait command, which basically lets your run a standard python function in the container’s namespaces, here’s a more obvious example:

import lxc

c = lxc.Container("p1")
if not c.running:
    c.start()

def print_hostname():
    with open("/etc/hostname", "r") as fd:
        print("Hostname: %s" % fd.read().strip())

# First run on the host
print_hostname()

# Then on the container
c.attach_wait(print_hostname)

if not c.shutdown(30):
    c.stop()

And the output of running the above:

stgraber@castiana:~$ python3 lxc-api.py
/home/stgraber/<frozen>:313: Warning: The python-lxc API isn't yet stable and may change at any point in the future.
Hostname: castiana
Hostname: p1

It may take you a little while to wrap your head around the possibilities offered by that function, especially as it also takes quite a few flags (look for LXC_ATTACH_* in the C API) which lets you control which namespaces to attach to, whether to have the function contained by apparmor, whether to bypass cgroup restrictions, …

That kind of flexibility is something you’ll never get with a virtual machine and the way it’s supported through our bindings makes it easier than ever to use by anyone who wants to automate custom workloads.

You can also use the API to script cloning containers and using snapshots (though for that example to work, you need current upstream master due to a small bug I found while writing this…):

import lxc
import os
import sys

if not os.geteuid() == 0:
    print("The use of overlayfs requires privileged containers.")
    sys.exit(1)

# Create a base container (if missing) using an Ubuntu 14.04 image
base = lxc.Container("base")
if not base.defined:
    base.create("download", lxc.LXC_CREATE_QUIET, {"dist": "ubuntu",
                                                   "release": "precise",
                                                   "arch": "i386"})

    # Customize it a bit
    base.start()
    base.get_ips(timeout=30)
    base.attach_wait(lxc.attach_run_command, ["apt-get", "update"])
    base.attach_wait(lxc.attach_run_command, ["apt-get", "dist-upgrade", "-y"])

    if not base.shutdown(30):
        base.stop()

# Clone it as web (if not already existing)
web = lxc.Container("web")
if not web.defined:
    # Clone base using an overlayfs overlay
    web = base.clone("web", bdevtype="overlayfs",
                     flags=lxc.LXC_CLONE_SNAPSHOT)

    # Install apache
    web.start()
    web.get_ips(timeout=30)
    web.attach_wait(lxc.attach_run_command, ["apt-get", "update"])
    web.attach_wait(lxc.attach_run_command, ["apt-get", "install",
                                             "apache2", "-y"])

    if not web.shutdown(30):
        web.stop()

# Create a website container based on the web container
mysite = web.clone("mysite", bdevtype="overlayfs",
                   flags=lxc.LXC_CLONE_SNAPSHOT)
mysite.start()
ips = mysite.get_ips(family="inet", timeout=30)
if ips:
    print("Website running at: http://%s" % ips[0])
else:
    if not mysite.shutdown(30):
        mysite.stop()

The above will create a base container using a downloaded image, then clone it using an overlayfs based overlay, add apache2 to it, then clone that resulting container into yet another one called “mysite”. So “mysite” is effectively an overlay clone of “web” which is itself an overlay clone of “base”.

 

So there you go, I tried to cover most of the interesting bits of our API with the examples above, though there’s much more available, for example, I didn’t cover the snapshot API (currently restricted to system containers) outside of the specific overlayfs case above and only scratched the surface of what’s possible to do with the attach function.

LXC 1.0 will release with a stable version of the API, we’ll be doing additions in the next few 1.x versions (while doing bugfix only updates to 1.0.x) and hope not to have to break the whole API for quite a while (though we’ll certainly be adding more stuff to it).

Read more
Stéphane Graber

This is post 7 out of 10 in the LXC 1.0 blog post series.

Introduction to unprivileged containers

The support of unprivileged containers is in my opinion one of the most important new features of LXC 1.0.

You may remember from previous posts that I mentioned that LXC should be considered unsafe because while running in a separate namespace, uid 0 in your container is still equal to uid 0 outside of the container, meaning that if you somehow get access to any host resource through proc, sys or some random syscalls, you can potentially escape the container and then you’ll be root on the host.

That’s what user namespaces were designed for and implemented. It was a multi-year effort to think them through and slowly push the hundreds of patches required into the upstream kernel, but finally with 3.12 we got to a point where we can start a full system container entirely as a user.

So how do those user namespaces work? Well, simply put, each user that’s allowed to use them on the system gets assigned a range of unused uids and gids, ideally a whole 65536 of them. You can then use those uids and gids with two standard tools called newuidmap and newgidmap which will let you map any of those uids and gids to virtual uids and gids in a user namespace.

That means you can create a container with the following configuration:

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

The above means that I have one uid map and one gid map defined for my container which will map uids and gids 0 through 65536 in the container to uids and gids 100000 through 165536 on the host.

For this to be allowed, I need to have those ranges assigned to my user at the system level with:

stgraber@castiana:~$ grep stgraber /etc/sub* 2>/dev/null
/etc/subgid:stgraber:100000:65536
/etc/subuid:stgraber:100000:65536

LXC has now been updated so that all the tools are aware of those unprivileged containers. The standard paths also have their unprivileged equivalents:

  • /etc/lxc/lxc.conf => ~/.config/lxc/lxc.conf
  • /etc/lxc/default.conf => ~/.config/lxc/default.conf
  • /var/lib/lxc => ~/.local/share/lxc
  • /var/lib/lxcsnaps => ~/.local/share/lxcsnaps
  • /var/cache/lxc => ~/.cache/lxc

Your user, while it can create new user namespaces in which it’ll be uid 0 and will have some of root’s privileges against resources tied to that namespace will obviously not be granted any extra privilege on the host.

One such thing is creating new network devices on the host or changing bridge configuration. To workaround that, we wrote a tool called “lxc-user-nic” which is the only SETUID binary part of LXC 1.0 and which performs one simple task.
It parses a configuration file and based on its content will create network devices for the user and bridge them. To prevent abuse, you can restrict the number of devices a user can request and to what bridge they may be added.

An example is my own /etc/lxc/lxc-usernet file:

stgraber veth lxcbr0 10

This declares that the user “stgraber” is allowed up to 10 veth type devices to be created and added to the bridge called lxcbr0.

Between what’s offered by the user namespace in the kernel and that setuid tool, we’ve got all that’s needed to run most distributions unprivileged.

Pre-requirements

All examples and instructions I’ll be giving below are expecting that you are running a perfectly up to date version of Ubuntu 14.04 (codename trusty). That’s a pre-release of Ubuntu so you may want to run it in a VM or on a spare machine rather than upgrading your production computer.

The reason to want something that recent is because the rough requirements for well working unprivileged containers are:

  • Kernel: 3.13 + a couple of staging patches (which Ubuntu has in its kernel)
  • User namespaces enabled in the kernel
  • A very recent version of shadow that supports subuid/subgid
  • Per-user cgroups on all controllers (which I turned on a couple of weeks ago)
  • LXC 1.0 beta2 or higher (released two days ago)
  • A version of PAM with a loginuid patch that’s yet to be in any released version

Those requirements happen to all be true of the current development release of Ubuntu as of two days ago.

LXC pre-built containers

User namespaces come with quite a few obvious limitations. For example in a user namespace you won’t be allowed to use mknod to create a block or character device as being allowed to do so would let you access anything on the host. Same thing goes with some filesystems, you won’t for example be allowed to do loop mounts or mount an ext partition, even if you can access the block device.

Those limitations while not necessarily world ending in day to day use are a big problem during the initial bootstrap of a container as tools like debootstrap, yum, … usually try to do some of those restricted actions and will fail pretty badly.

Some templates may be tweaked to work and workaround such as a modified fakeroot could be used to bypass some of those limitations but the goal of the LXC project isn’t to require all of our users to be distro engineers, so we came up with a much simpler solution.

I wrote a new template called “download” which instead of assembling the rootfs and configuration locally will instead contact a server which contains daily pre-built rootfs and configuration for most common templates.

Those images are built from our Jenkins server using a few machines I have on my home network (a set of powerful x86 builders and a quadcore ARM board). The actual build process is pretty straightforward, a basic chroot is assembled, then the current git master is downloaded, built and the standard templates are run with the right release and architecture, the resulting rootfs is compressed, a basic config and metadata (expiry, files to template, …) is saved, the result is pulled by our main server, signed with a dedicated GPG key and published on the public web server.

The client side is a simple template which contacts the server over https (the domain is also DNSSEC enabled and available over IPv6), grabs signed indexes of all the available images, checks if the requested combination of distribution, release and architecture is supported and if it is, grabs the rootfs and metadata tarballs, validates their signature and stores them in a local cache. Any container creation after that point is done using that cache until the time the cache entries expires at which point it’ll grab a new copy from the server.

The current list of images is (as can be requested by passing –list):

---
DIST      RELEASE   ARCH    VARIANT    BUILD
---
debian    wheezy    amd64   default    20140116_22:43
debian    wheezy    armel   default    20140116_22:43
debian    wheezy    armhf   default    20140116_22:43
debian    wheezy    i386    default    20140116_22:43
debian    jessie    amd64   default    20140116_22:43
debian    jessie    armel   default    20140116_22:43
debian    jessie    armhf   default    20140116_22:43
debian    jessie    i386    default    20140116_22:43
debian    sid       amd64   default    20140116_22:43
debian    sid       armel   default    20140116_22:43
debian    sid       armhf   default    20140116_22:43
debian    sid       i386    default    20140116_22:43
oracle    6.5       amd64   default    20140117_11:41
oracle    6.5       i386    default    20140117_11:41
plamo     5.x       amd64   default    20140116_21:37
plamo     5.x       i386    default    20140116_21:37
ubuntu    lucid     amd64   default    20140117_03:50
ubuntu    lucid     i386    default    20140117_03:50
ubuntu    precise   amd64   default    20140117_03:50
ubuntu    precise   armel   default    20140117_03:50
ubuntu    precise   armhf   default    20140117_03:50
ubuntu    precise   i386    default    20140117_03:50
ubuntu    quantal   amd64   default    20140117_03:50
ubuntu    quantal   armel   default    20140117_03:50
ubuntu    quantal   armhf   default    20140117_03:50
ubuntu    quantal   i386    default    20140117_03:50
ubuntu    raring    amd64   default    20140117_03:50
ubuntu    raring    armhf   default    20140117_03:50
ubuntu    raring    i386    default    20140117_03:50
ubuntu    saucy     amd64   default    20140117_03:50
ubuntu    saucy     armhf   default    20140117_03:50
ubuntu    saucy     i386    default    20140117_03:50
ubuntu    trusty    amd64   default    20140117_03:50
ubuntu    trusty    armhf   default    20140117_03:50
ubuntu    trusty    i386    default    20140117_03:50

The template has been carefully written to work on any system that has a POSIX compliant shell with wget. gpg is recommended but can be disabled if your host doesn’t have it (at your own risks).

The same template can be used against your own server, which I hope will be very useful for enterprise deployments to build templates in a central location and have them pulled by all the hosts automatically using our expiry mechanism to keep them fresh.

While the template was designed to workaround limitations of unprivileged containers, it works just as well with system containers, so even on a system that doesn’t support unprivileged containers you can do:

lxc-create -t download -n p1 -- -d ubuntu -r trusty -a amd64

And you’ll get a new container running the latest build of Ubuntu 14.04 amd64.

Using unprivileged LXC

Right, so let’s get you started, as I already mentioned, all the instructions below have only been tested on a very recent Ubuntu 14.04 (trusty) installation.
You may want to grab a daily build and run it in a VM.

Install the required packages:

  • sudo apt-get update
  • sudo apt-get dist-upgrade
  • sudo apt-get install lxc systemd-services uidmap

Then, assign yourself a set of uids and gids with:

  • sudo usermod --add-subuids 100000-165536 $USER
  • sudo usermod --add-subgids 100000-165536 $USER
  • sudo chmod +x $HOME

That last one is required because LXC needs it to access ~/.local/share/lxc/ after it switched to the mapped UIDs. If you’re using ACLs, you may instead use “u:100000:x” as a more specific ACL.

Now create ~/.config/lxc/default.conf with the following content:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

And /etc/lxc/lxc-usernet with:

<your username> veth lxcbr0 10

And that’s all you need. Now let’s create our first unprivileged container with:

lxc-create -t download -n p1 -- -d ubuntu -r trusty -a amd64

You should see the following output from the download template:

Setting up the GPG keyring
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created an Ubuntu container (release=trusty, arch=amd64).
The default username/password is: ubuntu / ubuntu
To gain root privileges, please use sudo.

So looks like your first container was created successfully, now let’s see if it starts:

ubuntu@trusty-daily:~$ lxc-start -n p1 -d
ubuntu@trusty-daily:~$ lxc-ls --fancy
NAME  STATE    IPV4     IPV6     AUTOSTART  
------------------------------------------
p1    RUNNING  UNKNOWN  UNKNOWN  NO

It’s running! At this point, you can get a console using lxc-console or can SSH to it by looking for its IP in the ARP table (arp -n).

One thing you probably noticed above is that the IP addresses for the container aren’t listed, that’s because unfortunately LXC currently can’t attach to an unprivileged container’s namespaces. That also means that some fields of lxc-info will be empty and that you can’t use lxc-attach. However we’re looking into ways to get that sorted in the near future.

There are also a few problems with job control in the kernel and with PAM, so doing a non-detached lxc-start will probably result in a rather weird console where things like sudo will most likely fail. SSH may also fail on some distros. A patch has been sent upstream for this, but I just noticed that it doesn’t actually cover all cases and even if it did, it’s not in any released version yet.

Quite a few more improvements to unprivileged containers are to come until the final 1.0 release next month and while we certainly don’t expect all workloads to be possible with unprivileged containers, it’s still a huge improvement on what we had before and a very good building block for a lot more interesting use cases.

Read more
Stéphane Graber

This is post 6 out of 10 in the LXC 1.0 blog post series.

When talking about container security most people either consider containers as inherently insecure or inherently secure. The reality isn’t so black and white and LXC supports a variety of technologies to mitigate most security concerns.

One thing to clarify right from the start is that you won’t hear any of the LXC maintainers tell you that LXC is secure so long as you use privileged containers. However, at least in Ubuntu, our default containers ship with what we think is a pretty good configuration of both the cgroup access and an extensive apparmor profile which prevents all attacks that we are aware of.

Below I’ll be covering the various technologies LXC supports to let you restrict what a container may do. Just keep in mind that unless you are using unprivileged containers, you shouldn’t give root access to a container to someone whom you’d mind having root access to your host.

Capabilities

The first security feature which was added to LXC was Linux capabilities support. With that feature you can set a list of capabilities that you want LXC to drop before starting the container or a full list of capabilities to retain (all others will be dropped).

The two relevant configurations options are:

  • lxc.cap.drop
  • lxc.cap.keep

Both are lists of capability names as listed in capabilities(7).

This may sound like a great way to make containers safe and for very specific cases it may be, however if running a system container, you’ll soon notice that dropping sys_admin and net_admin isn’t very practical and short of dropping those, you won’t make your container much safer (as root in the container will be able to re-grant itself any dropped capability).

In Ubuntu we use lxc.cap.drop to drop sys_module, mac_admin, mac_override, sys_time which prevent some known problems at container boot time.

Control groups

Control groups are interesting because they achieve multiple things which while interconnected are still pretty different:

  • Resource bean counting
  • Resource quotas
  • Access restrictions

The first two aren’t really security related, though resource quotas will let you avoid some obvious DoS of the host (by setting memory, cpu and I/O limits).

The last is mostly about the devices cgroup which lets you define which character and block devices a container may access and what it can do with them (you can restrict creation, read access and write access for each major/minor combination).

In LXC, configuring cgroups is done with the “lxc.cgroup.*” options which can roughly be defined as: lxc.cgroup.<controller>.<key> = <value>

For example to set a memory limit on p1 you’d add the following to its configuration:

lxc.cgroup.memory.limit_in_bytes = 134217728

This will set a memory limit of 128MB (the value is in bytes) and will be the equivalent to writing that same value to /sys/fs/cgroup/memory/lxc/p1/memory.limit_in_bytes

Most LXC templates only set a few devices controller entries by default:

# Default cgroup limits
lxc.cgroup.devices.deny = a
## Allow any mknod (but not using the node)
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
## /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
## consoles
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 5:1 rwm
## /dev/{,u}random
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 1:9 rwm
## /dev/pts/*
lxc.cgroup.devices.allow = c 5:2 rwm
lxc.cgroup.devices.allow = c 136:* rwm
## rtc
lxc.cgroup.devices.allow = c 254:0 rm
## fuse
lxc.cgroup.devices.allow = c 10:229 rwm
## tun
lxc.cgroup.devices.allow = c 10:200 rwm
## full
lxc.cgroup.devices.allow = c 1:7 rwm
## hpet
lxc.cgroup.devices.allow = c 10:228 rwm
## kvm
lxc.cgroup.devices.allow = c 10:232 rwm

This configuration allows the container (usually udev) to create any device it wishes (that’s the wildcard “m” above) but block everything else (the “a” deny entry) unless it’s listed in one of the allow entries below. This covers everything a container will typically need to function.

You will find reasonably up to date documentation about the available controllers, control files and supported values at:
https://www.kernel.org/doc/Documentation/cgroups/

Apparmor

A little while back we added Apparmor profiles support to LXC.
The Apparmor support is rather simple, there’s one configuration option “lxc.aa_profile” which sets what apparmor profile to use for the container.

LXC will then setup the container and ask apparmor to switch it to that profile right before starting the container. Ubuntu’s LXC profile is rather complex as it aims to prevent any of the known ways of escaping a container or cause harm to the host.

As things are today, Ubuntu ships with 3 apparmor profiles meaning that the supported values for lxc.aa_profile are:

  • lxc-container-default (default value if lxc.aa_profile isn’t set)
  • lxc-container-default-with-nesting (same as default but allows some needed bits for nested containers)
  • lxc-container-default-with-mounting (same as default but allows mounting ext*, xfs and btrfs file systems).
  • unconfined (a special value which will disable apparmor support for the container)

You can also define your own by copying one of the ones in /etc/apparmor.d/lxc/, adding the bits you want, giving it a unique name, then reloading apparmor with “sudo /etc/init.d/apparmor reload” and finally setting lxc.aa_profile to the new profile’s name.

SELinux

The SELinux support is very similar to Apparmor’s. An SELinux context can be set using “lxc.se_context”.

An example would be:

lxc.se_context = unconfined_u:unconfined_r:lxc_t:s0-s0:c0.c1023

Similarly to Apparmor, LXC will switch to the new SELinux context right before starting init in the container. As far as I know, no distributions are setting a default SELinux context at this time, however most distributions build LXC with SELinux support (including Ubuntu, should someone choose to boot their host with SELinux rather than Apparmor).

Seccomp

Seccomp is a fairly recent kernel mechanism which allows for filtering of system calls.
As a user you can write a seccomp policy file and set it using “lxc.seccomp” in the container’s configuration. As always, this policy will only be applied to the running container and will allow or reject syscalls with a pre-defined return value.

An example (though limited and useless) of a seccomp policy file would be:

1
whitelist
103

Which would only allow syscall #103 (syslog) in the container and reject everything else.

Note that seccomp is a rather low level feature and only useful for some very specific use cases. All syscalls have to be referred by their ID instead of their name and those may change between architectures. Also, as things are today, if your host is 64bit and you load a seccomp policy file, all 32bit syscalls will be rejected. We’d need per-personality seccomp profiles to solve that but it’s not been a high priority so far.

User namespace

And last but not least, what’s probably the only way of making a container actually safe. LXC now has support for user namespaces. I’ll go into more details on how to use that feature in a later blog post but simply put, LXC is no longer running as root so even if an attacker manages to escape the container, he’d find himself having the privileges of a regular user on the host.

All this is achieved by assigning ranges of uids and gids to existing users. Those users on the host will then be allowed to clone a new user namespace in which all uids/gids are mapped to uids/gids that are part of the user’s range.

This obviously means that you need to allocate a rather silly amount of uids and gids to each user who’ll be using LXC in that way. In a perfect world, you’d allocate 65536 uids and gids per container and per user. As this would likely exhaust the whole uid/gid range rather quickly on some systems, I tend to go with “just” 65536 uids and gids per user that’ll use LXC and then have the same range shared by all containers.

Anyway, that’s enough details about user namespaces for now. I’ll cover how to actually set that up and use those unprivileged containers in the next post.

Read more
Stéphane Graber

This is post 5 out of 10 in the LXC 1.0 blog post series.

Storage backingstores

LXC supports a variety of storage backends (also referred to as backingstore).
It defaults to “none” which simply stores the rootfs under
/var/lib/lxc/<container>/rootfs but you can specify something else to lxc-create or lxc-clone with the -B option.

Currently supported values are:

directory based storage (“none” and “dir)

This is the default backingstore, the container rootfs is stored under
/var/lib/lxc/<container>/rootfs

The --dir option (when using “dir”) can be used to override the path.

btrfs

With this backingstore LXC will setup a new subvolume for the container which makes snapshotting much easier.

lvm

This one will use a new logical volume for the container.
The LV can be set with --lvname (the default is the container name).
The VG can be set with --vgname (the default is “lxc”).
The filesystem can be set with --fstype (the default is “ext4”).
The size can be set with --fssize (the default is “1G”).
You can also use LVM thinpools with --thinpool

overlayfs

This one is mostly used when cloning containers to create a container based on another one and storing any changes in an overlay.

When used with lxc-create it’ll create a container where any change done after its initial creation will be stored in a “delta0” directory next to the container’s rootfs.

zfs

Very similar to btrfs, as I’ve not used either of those myself I can’t say much about them besides that it should also create some kind of subvolume for the container and make snapshots and clones faster and more space efficient.

Standard paths

One quick word with the way LXC usually works and where it’s storing its files:

  • /var/lib/lxc (default location for containers)
  • /var/lib/lxcsnap (default location for snapshots)
  • /var/cache/lxc (default location for the template cache)
  • $HOME/.local/share/lxc (default location for unprivileged containers)
  • $HOME/.local/share/lxcsnap (default location for unprivileged snapshots)
  • $HOME/.cache/lxc (default location for unprivileged template cache)

The default path, also called lxcpath can be overridden on the command line with the -P option or once and for all by setting “lxcpath = /new/path” in /etc/lxc/lxc.conf (or $HOME/.config/lxc/lxc.conf for unprivileged containers).

The snapshot directory is always “snap” appended to lxcpath so it’ll magically follow lxcpath. The template cache is unfortunately hardcoded and can’t easily be moved short of relying on bind-mounts or symlinks.

The default configuration used for all containers at creation time is taken from
/etc/lxc/default.conf (no unprivileged equivalent yet).
The templates themselves are stored in /usr/share/lxc/templates.

Cloning containers

All those backingstores only really shine once you start cloning containers.
For example, let’s take our good old “p1” Ubuntu container and let’s say you want to make a usable copy of it called “p4”, you can simply do:

sudo lxc-clone -o p1 -n p4

And there you go, you’ve got a working “p4” container that’ll be a simple copy of “p1” but with a new mac address and its hostname properly set.

Now let’s say you want to do a quick test against “p1” but don’t want to alter that container itself, yet you don’t want to wait the time needed for a full copy, you can simply do:

sudo lxc-clone -o p1 -n p1-test -B overlayfs -s

And there you go, you’ve got a new “p1-test” container which is entirely based on the “p1” rootfs and where any change will be stored in the “delta0” directory of “p1-test”.
The same “-s” option also works with lvm and btrfs (possibly zfs too) containers and tells lxc-clone to use a snapshot rather than copy the whole rootfs across.

Snapshotting

So cloning is nice and convenient, great for things like development environments where you want throw away containers. But in production, snapshots tend to be a whole lot more useful for things like backup or just before you do possibly risky changes.

In LXC we have a “lxc-snapshot” tool which will let you create, list, restore and destroy snapshots of your containers.
Before I show you how it works, please note that “lxc-snapshot” currently doesn’t appear to work with directory based containers. With those it produces an empty snapshot, this should be fixed by the time LXC 1.0 is actually released.

So, let’s say we want to backup our “p1-lvm” container before installing “apache2” into it, simply run:

echo "before installing apache2" > snap-comment
sudo lxc-snapshot -n p1-lvm -c snap-comment

At which point, you can confirm the snapshot was created with:

sudo lxc-snapshot -n p1-lvm -L -C

Now you can go ahead and install “apache2” in the container.

If you want to revert the container at a later point, simply use:

sudo lxc-snapshot -n p1-lvm -r snap0

Or if you want to restore a snapshot as its own container, you can use:

sudo lxc-snapshot -n p1-lvm -r snap0 p1-lvm-snap0

And you’ll get a new “p1-lvm-snap0” container which will contain a working copy of “p1-lvm” as it was at “snap0”.

Read more