Canonical Voices

K. Tsakalozos

MicroK8s in the Wild

As the popularity of MicroK8s grows I would like to take the time to mention some projects that use this micro Kubernetes distribution. But before that, let me do some introductions. For those unfamiliar with Kubernetes, Kubernetes is an open source container orchestrator. It shows you how to deploy, upgrade, and provision your application. This is one of the rare occasions where all the major players (Google, Microsoft, IBM, Amazon etc) have flocked around a single framework making it an unofficial standard.

MicroK8s is a distribution of Kubernetes. It is a snap package that sets up a Kubernetes cluster on your machine. You can have a Kubernetes cluster for local development, CI/CD or just for getting to know Kubernetes with just a:

sudo snap install microk8s --classic

If you are on a Mac or Windows you will need a Linux VM.

In what follows you will find some examples on how people are using MicroK8s. Note that this is not a complete list of MicroK8s usages, it is just some efforts I happen to be aware of.

Spring Cloud Kubernetes

This project is using CircleCI for CI/CD. MicroK8s provides a local Kubernetes cluster where integration tests are run. The addons enabled are dns, the docker registry and Istio. The integration tests need to plug into the Kubernetes cluster using the kubeconfig file and the socket to dockerd. This work was introduced in this Pull Request (thanks George) and it gave us the incentive to add a microk8s.status command that would wait for the cluster to come online. For example we can wait up to 5 minutes for MicroK8s to come up with:

microk8s.status --wait-ready --timeout=300

OpenFaaS on MicroK8s

It was this year’s Config Management Camp where I met Joe McCobe the author of “Deploy OpenFaaS with MicroK8s”. I will just repeat his words “was blown away by the speed and ease with which I could get a basic lab environment up and running”.

What about Kubeless?

It seems the ease of deploying MicroK8s goes well with the ease of software development of serverless frameworks. Users of Kubeless are also kicking the tires on MicroK8s. Have a look at “Files upload from Kubeless on MicroK8s to Minio” and “Serverless MicroK8s Kubernetes.”

SUSE Cloud Application Platform (CAP) on Microk8s

In his blog post Dimitris describes in detail all the configuration he had to do to get the software from SUSE to run on MicroK8s. The most interesting part is the motivation behind this effort. As he says “… MicroK8s… use your machine’s resources without you having to decide on a VM size beforehand.” As he explained to me his application puts significant memory pressure only during bootstrap. MicroK8s enabled him to reclaim the unused memory after the initialization phase.

Kubeflow

Kubeflow is the missing link between Kubernetes and AI/ML. Canonical is actively involved in this so…. you should definitely check it out. Sure, I am biased but let me tell you a true story. I have a friend who was given three machines to deploy Tensorflow and run some experiments. She did not have any prior experience at the time so… none of the three node clusters were setup in exactly the same way. There was always something off. This head-scratching situation is just one reason to use Kubeflow.

Transcrobes

Transcrobes comes from an active member of the MicroK8s community. It serves as a language learning aid. “The system knows what you know, so can give you just the right amount of help to be able to understand the words you don’t know but gets out of the way for the stuff you do know.” Here MicroK8s is used for quick prototyping. We wish you all the best Anton, good luck!

Summing Up

We have seen a number of interesting use cases that include CI/CD, Serverless programming, lab setup, rapid prototyping and application development. If you have a MicroK8s use case do let us know. Come and say hi at #microk8s on the Kubernetes slack and/or issue a Pull Request against our MicroK8s In The Wild page.

References


MicroK8s in the Wild was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read more
Christian Brauner

Runtimes And the Curse of the Privileged Container

Introduction (CVE-2019-5736)

Today, Monday, 2019-02-11, 14:00:00 CET CVE-2019-5736 was released:

The vulnerability allows a malicious container to (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command (it doesn’t matter if the command is not attacker-controlled) as root within a container in either of these contexts:

  • Creating a new container using an attacker-controlled image.
  • Attaching (docker exec) into an existing container which the attacker had previous write access to.

I’ve been working on a fix for this issue over the last couple of weeks together with Aleksa a friend of mine and maintainer of runC. When he notified me about the issue in runC we tried to come up with an exploit for LXC as well and though harder it is doable. I was interested in the issue for technical reasons and figuring out how to reliably fix it was quite fun (with a proper dose of pure hatred). It also caused me to finally write down some personal thoughts I had for a long time about how we are running containers.

What are Privileged Containers?

At a first glance this is a question that is probably trivial to anyone who has a decent low-level understanding of containers. Maybe even most users by now will know what a privileged container is. A first pass at defining it would be to say that a privileged container is a container that is owned by root. Looking closer this seems an insufficient definition. What about containers using user namespaces that are started as root? It seems we need to distinguish between what ids a container is running with. So we could say a privileged container is a container that is running as root. However, this is still wrong. Because “running as root” can either be seen as meaning “running as root as seen from the outside” or “running as root from the inside” where “outside” means “as seen from a task outside the container” and “inside” means “as seen from a task inside the container”.

What we really mean by a privileged container is a container where the semantics for id 0 are the same inside and outside of the container ceteris paribus. I say “ceteris paribus” because using LSMs, seccomp or any other security mechanism will not cause a change in the meaning of id 0 inside and outside the container. For example, a breakout caused by a bug in the runtime implementation will give you root access on the host.

An unprivileged container then simply is any container in which the semantics for id 0 inside the container are different from id 0 outside the container. For example, a breakout caused by a bug in the runtime implementation will not give you root access on the host by default. This should only be possible if the kernel’s user namespace implementation has a bug.

The reason why I like to define privileged containers this way is that it also lets us handle edge cases. Specifically, the case where a container is using a user namespace but a hole is punched into the idmapping at id 0 aka where id 0 is mapped through. Consider a container that uses the following idmappings:

id: 0 100000 100000

This instructs the kernel to setup the following mapping:

id: container_id(0) -> host_id(100000)
id: container_id(1) -> host_id(100001)
id: container_id(2) -> host_id(100002)
.
.
.

container_id(100000) -> host_id(200000)

With this mapping it’s evident that container_id(0) != host_id(0). But now consider the following mapping:

id: 0 0 1
id: 1 100001 99999

This instructs the kernel to setup the following mapping:

id: container_id(0) -> host_id(0)
id: container_id(1) -> host_id(100001)
id: container_id(2) -> host_id(100002)
.
.
.

container_id(99999) -> host_id(199999)

In contrast to the first example this has the consequence that container_id(0) == host_id(0). I would argue that any container that at least punches a hole for id 0 into its idmapping up to specifying an identity mapping is to be considered a privileged container.

As a sidenote, Docker containers run as privileged containers by default. There is usually some confusion where people think because they do not use the --privileged flag that Docker containers run unprivileged. This is wrong. What the --privileged flag does is to give you even more permissions by e.g. not dropping (specific or even any) capabilities. One could say that such containers are almost “super-privileged”.

The Trouble with Privileged Containers

The problem I see with privileged containers is essentially captured by LXC’s and LXD’s upstream security position which we have held since at least 2015 but probably even earlier. I’m quoting from our notes about privileged containers:

Privileged containers are defined as any container where the container uid 0 is mapped to the host’s uid 0. In such containers, protection of the host and prevention of escape is entirely done through Mandatory Access Control (apparmor, selinux), seccomp filters, dropping of capabilities and namespaces.

Those technologies combined will typically prevent any accidental damage of the host, where damage is defined as things like reconfiguring host hardware, reconfiguring the host kernel or accessing the host filesystem.

LXC upstream’s position is that those containers aren’t and cannot be root-safe.

They are still valuable in an environment where you are running trusted workloads or where no untrusted task is running as root in the container.

We are aware of a number of exploits which will let you escape such containers and get full root privileges on the host. Some of those exploits can be trivially blocked and so we do update our different policies once made aware of them. Some others aren’t blockable as they would require blocking so many core features that the average container would become completely unusable.

[…]

As privileged containers are considered unsafe, we typically will not consider new container escape exploits to be security issues worthy of a CVE and quick fix. We will however try to mitigate those issues so that accidental damage to the host is prevented.

LXC’s upstream position for a long time has been that privileged containers are not and cannot be root safe. For something to be considered root safe it should be safe to hand root access to third parties or tasks.

Running Untrusted Workloads in Privileged Containers

is insane. That’s about everything that this paragraph should contain. The fact that the semantics for id 0 inside and outside the container are identical entails that any meaningful container escape will have the attacker gain root on the host.

CVE-2019-5736 Is a Very Very Very Bad Privilege Escalation to Host Root

CVE-2019-5736 is an excellent illustration of such an attack. Think about it: a process running inside a privileged container can rather trivially corrupt the binary that is used to attach to the container. This allows an attacker to create a custom ELF binary on the host. That binary could do anything it wants:

  • could just be a binary that calls poweroff
  • could be a binary that spawns a root shell
  • could be a binary that kills other containers when called again to attach
  • could be suid cat
  • .
  • .
  • .

The attack vector is actually slightly worse for runC due to its architecture. Since runC exits after spawning the container it can also be attacked through a malicious container image. Which is super bad given that a lot of container workload workflows rely on downloading images from the web.

LXC cannot be attacked through a malicious image since the monitor process (a singleton per-container) never exits during the containers life cycle. Since the kernel does not allow modifications to running binaries it is not possible for the attacker to corrupt it. When the container is shutdown or killed the attacking task will be killed before it can do any harm. Only when the last process running inside the container has exited will the monitor itself exit. This has the consequence, that if you run privileged OCI containers via our oci template with LXC your are not vulnerable to malicious images. Only the vector through the attaching binary still applies.

The Lie that Privileged Containers can be safe

Aside from mostly working on the Kernel I’m also a maintainer of LXC and LXD alongside Stéphane Graber. We are responsible for LXC - the low-level container runtime - and LXD - the container management daemon using LXC. We have made a very conscious decision to consider privileged containers not root safe. Two main corollaries follow from this:

  1. Privileged containers should never be used to run untrusted workloads.
  2. Breakouts from privileged containers are not considered CVEs by our security policy. It still seems a common belief that if we all just try hard enough using privileged containers for untrusted workloads is safe. This is not a promise that can be made good upon. A privileged container is not a security boundary. The reason for this is simply what we looked at above: container_id(0) == host_id(0). It is therefore deeply troubling that this industry is happy to let users believe that they are safe and secure using privileged containers.

Unprivileged Containers as Default

As upstream for LXC and LXD we have been advocating the use of unprivileged containers by default for years. Way ahead before anyone else did. Our low-level library LXC has supported unprivileged containers since 2013 when user namespaces were merged into the kernel. With LXD we have taken it one step further and made unprivileged containers the default and privileged containers opt-in for that very matter: privileged containers aren’t safe. We even allow you to have per-container idmappings to make sure that not just each container is isolated from the host but also all containers from each other.

For years we have been advocating for unprivileged containers on conferences, in blogposts, and whenever we have spoken to people but somehow this whole industry has chosen to rely on privileged containers.

The good news is that we are seeing changes as people become more familiar with the perils of privileged containers. Let this recent CVE be another reminder that unprivileged containers need to be the default.

Are LXC and LXD affected?

I have seen this question asked all over the place so I guess I should add a section about this too:

  • Unprivileged LXC and LXD containers are not affected.

  • Any privileged LXC and LXD container running on a read-only rootfs is not affected.

  • Privileged LXC containers in the definition provided above are affected. Though the attack is more difficult than for runC. The reason for this is that the lxc-attach binary does not exit before the program in the container has finished executing. This means an attacker would need to open an O_PATH file descriptor to /proc/self/exe, fork() itself into the background and re-open the O_PATH file descriptor through /proc/self/fd/<O_PATH-nr> in a loop as O_WRONLY and keep trying to write to the binary until such time as lxc-attach exits. Before that it will not succeed since the kernel will not allow modification of a running binary.

  • Privileged LXD containers are only affected if the daemon is restarted other than for upgrade reasons. This should basically never happen. The LXD daemon never exits so any write will fail because the kernel does not allow modification of a running binary. If the LXD daemon is restarted because of an upgrade the binary will be swapped out and the file descriptor used for the attack will write to the old in-memory binary and not to the new binary.

Chromebooks with Crostini using LXD are not affected

Chromebooks use LXD as their default container runtime are not affected. First of all, all binaries reside on a read-only filesystem and second, LXD does not allow running privileged containers on Chromebooks through the LXD_UNPRIVILEGED_ONLY flag. For more details see this link.

Fixing CVE-2019-5736

To prevent this attack, LXC has been patched to create a temporary copy of the calling binary itself when it attaches to containers (cf. 6400238d08cdf1ca20d49bafb85f4e224348bf9d). To do this LXC can be instructed to create an anonymous, in-memory file using the memfd_create() system call and to copy itself into the temporary in-memory file, which is then sealed to prevent further modifications. LXC then executes this sealed, in-memory file instead of the original on-disk binary. Any compromising write operations from a privileged container to the host LXC binary will then write to the temporary in-memory binary and not to the host binary on-disk, preserving the integrity of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed, writes to this will also fail. To not break downstream users of the shared library this is opt-in by setting LXC_MEMFD_REXEC in the environment. For our lxc-attach binary which is the only attack vector this is now done by default.

Workloads that place the LXC binaries on a read-only filesystem or prevent running privileged containers can disable this feature by passing --disable-memfd-rexec during the configure stage when compiling LXC.

Read more

Snapcraft 3.1

snapcraft 3.1 is now available on the stable channel of the Snap Store. This is a new minor release building on top of the foundations laid out from the snapcraft 3.0 release. If you are already on the stable channel for snapcraft then all you need to do is wait for the snap to be refreshed. The full release notes are replicated here below Build Environments It is now possible, when using the base keyword, to once again clean parts.

Read more
jdstrand

Some time ago we started alerting publishers when their stage-packages received a security update since the last time they built a snap. We wanted to create the right balance for the alerts and so the service currently will only alert you when there are new security updates against your stage-packages. In this manner, you can choose not to rebuild your snap (eg, since it doesn’t use the affected functionality of the vulnerable package) and not be nagged every day that you are out of date.

As nice as that is, sometimes you want to check these things yourself or perhaps hook the alerts into some form of automation or tool. While the review-tools had all of the pieces so you could do this, it wasn’t as straightforward as it could be. Now with the latest stable revision of the review-tools, this is easy:

$ sudo snap install review-tools
$ review-tools.check-notices \
  ~/snap/review-tools/common/review-tools_656.snap
{'review-tools': {'656': {'libapt-inst2.0': ['3863-1'],
                          'libapt-pkg5.0': ['3863-1'],
                          'libssl1.0.0': ['3840-1'],
                          'openssl': ['3840-1'],
                          'python3-lxml': ['3841-1']}}}

The review-tools are a strict mode snap and while it plugs the home interface, that is only for convenience, so I typically disconnect the interface and put things in its SNAP_USER_COMMON directory, like I did above.

Since now it is super easy to check a snap on disk, with a little scripting and a cron job, you can generate a machine readable report whenever you want. Eg, can do something like the following:

$ cat ~/bin/check-snaps
#!/bin/sh
set -e

snaps="review-tools/stable rsync-jdstrand/edge"

tmpdir=$(mktemp -d -p "$HOME/snap/review-tools/common")
cleanup() {
    rm -fr "$tmpdir"
}
trap cleanup EXIT HUP INT QUIT TERM

cd "$tmpdir" || exit 1
for i in $snaps ; do
    snap=$(echo "$i" | cut -d '/' -f 1)
    channel=$(echo "$i" | cut -d '/' -f 2)
    snap download "$snap" "--$channel" >/dev/null
done
cd - >/dev/null || exit 1

/snap/bin/review-tools.check-notices "$tmpdir"/*.snap

or if  you already have the snaps on disk somewhere, just do:

$ /snap/bin/review-tools.check-notices /path/to/snaps/*.snap

Now can add the above to cron or some automation tool as a reminder of what needs updates. Enjoy!

Read more
K. Tsakalozos

No.

No.

Read more
Ghost

Welcome to Ghost

Welcome to Ghost

Read more
Ghost

Writing posts with Ghost ✍️

Ghost has a powerful visual editor with familiar formatting options, as well as the ability to seamlessly add dynamic content.

Select the text to add formatting, headers or create links, or use Markdown shortcuts to do the work for you - if that's your thing.

Writing posts with Ghost ✍️

Rich editing at your fingertips

The editor can also handle rich media objects, called cards.

You can insert a card either by clicking the  +  button on a new line, or typing  /  on a new line to search for a particular card. This allows you to efficiently insert images, markdown, html and embeds.

For Example:

  • Insert a video from YouTube directly into your content by pasting the URL
  • Create unique content like a button or content opt-in using the HTML card
  • Need to share some code? Embed code blocks directly
<header class="site-header outer">
    <div class="inner">
        {{> "site-nav"}}
    </div>
</header>

Working with images in posts

You can add images to your posts in many ways:

  • Upload from your computer
  • Click and drag an image into the browser
  • Paste directly into the editor from your clipboard
  • Insert using a URL

Once inserted you can blend images beautifully into your content at different sizes and add captions wherever needed.

Writing posts with Ghost ✍️

The post settings menu and publishing options can be found in the top right hand corner. For more advanced tips on post settings check out the publishing options post!

Read more
Ghost

Publishing options

Publishing options

The Ghost editor has everything you need to fully optimise your content. This is where you can add tags and authors, feature a post, or turn a post into a page.

Access the post settings menu in the top right hand corner of the editor.

Post feature image

Insert your post feature image from the very top of the post settings menu. Consider resizing or optimising your image first to ensure it's an appropriate size.

Structured data & SEO

Customise your social media sharing cards for Facebook and Twitter, enabling you to add custom images, titles and descriptions for social media.

There’s no need to hard code your meta data. You can set your meta title and description using the post settings tool, which has a handy character guide and SERP preview.

Ghost will automatically implement structured data for your publication using JSON-LD to further optimise your content.

{
    "@context": "https://schema.org",
    "@type": "Article",
    "publisher": {
        "@type": "Organization",
        "name": "Publishing options",
        "logo": "https://static.ghost.org/ghost-logo.svg"
    },
    "author": {
        "@type": "Person",
        "name": "Ghost",
        "url": "http://demo.ghost.io/author/ghost/",
        "sameAs": []
    },
    "headline": "Publishing options",
    "url": "http://demo.ghost.io/publishing-options",
    "datePublished": "2018-08-08T11:44:00.000Z",
    "dateModified": "2018-08-09T12:06:21.000Z",
    "keywords": "Getting Started",
    "description": "The Ghost editor has everything you need to fully optimise your content. This is where you can add tags and authors, feature a post, or turn a post into a page.",
    }
}
    

You can test that the structured data schema on your site is working as it should using Google’s structured data tool.

Code Injection

This tool allows you to inject code on a per post or page basis, or across your entire site. This means you can modify CSS, add unique tracking codes, or add other scripts to the head or foot of your publication without making edits to your theme files.

To add code site-wide, use the code injection tool in the main admin menu. This is useful for adding a Facebook Pixel, a Google Analytics tracking code, or to start tracking with any other analytics tool.

To add code to a post or page, use the code injection tool within the post settings menu. This is useful if you want to add art direction, scripts or styles that are only applicable to one post or page.

From here, you might be interested in managing some more specific admin settings!

Read more
Ghost

Managing admin settings

Managing admin settings

There are a couple of things to do next while you're getting set up:

Make your site private

If you've got a publication that you don't want the world to see yet because it's not ready to launch, you can hide your Ghost site behind a basic shared pass-phrase.

You can toggle this preference on at the bottom of Ghost's General Settings:

Managing admin settings

Ghost will give you a short, randomly generated pass-phrase which you can share with anyone who needs access to the site while you're working on it. While this setting is enabled, all search engine optimisation features will be switched off to help keep your site under the radar.

Do remember though, this is not secure authentication. You shouldn't rely on this feature for protecting important private data. It's just a simple, shared pass-phrase for some very basic privacy.


Invite your team

Ghost has a number of different user roles for your team:

Contributors
This is the base user level in Ghost. Contributors can create and edit their own draft posts, but they are unable to edit drafts of others or publish posts. Contributors are untrusted users with the most basic access to your publication.

Authors
Authors are the 2nd user level in Ghost. Authors can write, edit  and publish their own posts. Authors are trusted users. If you don't trust users to be allowed to publish their own posts, they should be set as Contributors.

Editors
Editors are the 3rd user level in Ghost. Editors can do everything that an Author can do, but they can also edit and publish the posts of others - as well as their own. Editors can also invite new Contributors+Authors to the site.

Administrators
The top user level in Ghost is Administrator. Again, administrators can do everything that Authors and Editors can do, but they can also edit all site settings and data, not just content. Additionally, administrators have full access to invite, manage or remove any other user of the site.

The Owner
There is only ever one owner of a Ghost site. The owner is a special user which has all the same permissions as an Administrator, but with two exceptions: The Owner can never be deleted. And in some circumstances the owner will have access to additional special settings if applicable. For example: billing details, if using Ghost(Pro).

It's a good idea to ask all of your users to fill out their user profiles, including bio and social links. These will populate rich structured data for posts and generally create more opportunities for themes to fully populate their design.

Next up: Organising your content

Read more
Ghost

Organising your content

Organising your content

Ghost has a flexible organisational taxonomy called tags which can be used to configure your site structure using dynamic routing.

Basic Tagging

You can think of tags like Gmail labels. By tagging posts with one or more keyword, you can organise articles into buckets of related content.

When you create content for your publication you can assign tags to help differentiate between categories of content.

For example you may tag some content with  News and other content with Podcast, which would create two distinct categories of content listed on /tag/news/ and /tag/weather/, respectively.

If you tag a post with both News and Weather - then it appears in both sections. Tag archives are like dedicated home-pages for each category of content that you have. They have their own pages, their own RSS feeds, and can support their own cover images and meta data.

The primary tag

Inside the Ghost editor, you can drag and drop tags into a specific order. The first tag in the list is always given the most importance, and some themes will only display the primary tag (the first tag in the list) by default.

News, Technology, Startup

So you can add the most important tag which you want to show up in your theme, but also add related tags which are less important.

Private tags

Sometimes you may want to assign a post a specific tag, but you don't necessarily want that tag appearing in the theme or creating an archive page. In Ghost, hashtags are private and can be used for special styling.

For example, if you sometimes publish posts with video content - you might want your theme to adapt and get rid of the sidebar for these posts, to give more space for an embedded video to fill the screen. In this case, you could use private tags to tell your theme what to do.

News, #video

Here, the theme would assign the post publicly displayed tags of News - but it would also keep a private record of the post being tagged with #video. In your theme, you could then look for private tags conditionally and give them special formatting.

You can find documentation for theme development techniques like this and many more over on Ghost's extensive theme documentation.

Dynamic Routing

Dynamic routing gives you the ultimate freedom to build a custom publication to suit your needs. Routes are rules that map URL patterns to your content and templates.

For example, you may not want content tagged with News to exist on: example.com/tag/news. Instead, you want it to exist on example.com/news .

In this case you can use dynamic routes to create customised collections of content on your site. It's also possible to use multiple templates in your theme to render each content type differently.

There are lots of use cases for dynamic routing with Ghost, here are a few common examples:

  • Setting a custom home page with its own template
  • Having separate content hubs for blog and podcast, that render differently, and have custom RSS feeds to support two types of content
  • Creating a founders column as a unique view, by filtering content created by specific authors
  • Including dates in permalinks for your posts
  • Setting posts to have a URL relative to their primary tag like example.com/europe/story-title/
Dynamic routing can be configured in Ghost using YAML files. Read our dynamic routing documentation for further details.

You can further customise your site using Apps & Integrations.

Read more
Ghost

Apps & integrations

Apps & integrations

There are three primary ways to work with third-party services in Ghost: using Zapier, editing your theme, or using the Ghost API.

Zapier

You can connect your Ghost site to over 1,000 external services using the official integration with Zapier.

Zapier sets up automations with Triggers and Actions, which allows you to create and customise a wide range of connected applications.

Example: When someone new subscribes to a newsletter on a Ghost site (Trigger) then the contact information is automatically pushed into MailChimp (Action).

Here are the most popular Ghost<>Zapier automation templates:

Editing your theme

One of the biggest advantages of using Ghost over centralised platforms is that you have total control over the front end of your site. Either customise your existing theme, or create a new theme from scratch with our Theme SDK.

You can integrate any front end code into a Ghost theme without restriction, and it will work just fine. No restrictions!

Here are some common examples:

  • Include comments on a Ghost blog with Disqus or Discourse
  • Implement MathJAX with a little bit of JavaScript
  • Add syntax highlighting to your code snippets using Prism.js
  • Integrate any dynamic forms from Google or Typeform to capture data
  • Just about anything which uses JavaScript, APIs and Markup.

Using the Public API

Ghost itself is driven by a set of core APIs, and so you can access the Public Ghost JSON API from external webpages or applications in order to pull data and display it in other places.

The Ghost API is thoroughly documented and straightforward to work with for developers of almost any level.

Alright, the last post in our welcome-series! If you're curious about creating your own Ghost theme from scratch, here are some more details on how that works.

Read more
Ghost

Creating a custom theme

Creating a custom theme

Ghost comes with a beautiful default theme called Casper, which is designed to be a clean, readable publication layout and can be adapted for most purposes. However, Ghost can also be completely themed to suit your needs. Rather than just giving you a few basic settings which act as a poor proxy for code, we just let you write code.

There are a huge range of both free and premium pre-built themes which you can get from the Ghost Theme Marketplace, or you can create your own from scratch.

Creating a custom theme
Anyone can write a completely custom Ghost theme with some solid knowledge of HTML and CSS

Ghost themes are written with a templating language called handlebars, which has a set of dynamic helpers to insert your data into template files. For example: {{author.name}} outputs the name of the current author.

The best way to learn how to write your own Ghost theme is to have a look at the source code for Casper, which is heavily commented and should give you a sense of how everything fits together.

  • default.hbs is the main template file, all contexts will load inside this file unless specifically told to use a different template.
  • post.hbs is the file used in the context of viewing a post.
  • index.hbs is the file used in the context of viewing the home page.
  • and so on

We've got full and extensive theme documentation which outlines every template file, context and helper that you can use.

If you want to chat with other people making Ghost themes to get any advice or help, there's also a themes section on our public Ghost forum.

Read more
Colin Watson

Git per-branch permissions

We’ve had Git hosting support in Launchpad for a few years now. One thing that some users asked for, particularly larger users such as the Ubuntu kernel team, was the ability to set up per-branch push permissions for their repositories. Today we rolled out the last piece of this work.

Launchpad’s default behaviour is that repository owners may push anything to their own repositories, including creating new branches, force-pushing (rewriting history), and deleting branches, while nobody else may push anything. Repository owners can now also choose to protect branches or tags, either individually or using wildcard rules. If a branch is protected, then by default repository owners can only create or push it but cannot force-push or delete; if a tag is protected, then by default repository owners can create it but cannot move or delete it.

You can also allow selected contributors to push to protected branches or tags, so if you’re collaborating with somebody on a branch and just want to be able to quickly pair-program via git push, or you want a merge robot to be able to land merge proposals in your repository without having to add it to the team that owns the repository and thus give it privileges it doesn’t need, then this feature may be for you.

There’s some initial documentation on our help site, and here’s a screenshot of a repository that’s been set up to give a contributor push access to a single branch:

Read more
Christian Brauner

Android Binderfs

asciicast

Introduction

Android Binder is an inter-process communication (IPC) mechanism. It is heavily used in all Android devices. The binder kernel driver has been present in the upstream Linux kernel for quite a while now.

Binder has been a controversial patchset (see this lwn article as an example). Its design was considered wrong and to violate certain core kernel design principles (e.g. a task should never touch another tasks file descriptor table). Most kernel developers were not a fan of binder.

Recently, the upstream binder code has fortunately been reworked significantly (e.g. it does not touch another tasks file descriptor table anymore, the locking is very fine-grained now, etc.).

With Android being one of the major operating systems (OS) for a vast number of devices there is simply no way around binder.

The Android Service Manager

The binder IPC mechanism is accessible from userspace through device nodes located at /dev. A modern Android system will allocate three device nodes:

  • /dev/binder
  • /dev/hwbinder
  • /dev/vndbinder

serving different purposes. However, the logic is the same for all three of them. A process can call open(2) on those device nodes to receive an fd which it can then use to issue requests via ioctl(2)s. Android has a service manager which is used to translate addresses to bus names and only the address of the service manager itself is well-known. The service manager is registered through an ioctl(2) and there can only be a single service manager. This means once a service manager has grabbed hold of binder devices they cannot be (easily) reused by a second service manager.

Running Android in Containers

This matters as soon as multiple instances of Android are supposed to be run. Since they will all need their own private binder devices. This is a use-case that arises pretty naturally when running Android in system containers. People have been doing this for a long time with LXC. A project that has set out to make running Android in LXC containers very easy is Anbox. Anbox makes it possible to run hundreds of Android containers.

To properly run Android in a container it is necessary that each container has a set of private binder devices.

Statically Allocating binder Devices

Binder devices are currently statically allocated at compile time. Before compiling a kernel the CONFIG_ANDROID_BINDER_DEVICES option needs to bet set in the kernel config (Kconfig) containing the names of the binder devices to allocate at boot. By default it is set as:

CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"

To allocate additional binder devices the user needs to specify them with this Kconfig option. This is problematic since users need to know how many containers they will run at maximum and then to calculate the number of devices they need so they can specify them in the Kconfig. When the maximum number of needed binder devices changes after kernel compilation the only way to get additional devices is to recompile the kernel.

Problem 1: Using the misc major Device Number

This situation is aggravated by the fact that binder devices use the misc major number in the kernel. Each device node in the Linux kernel is identified by a major and minor number. A device can request its own major number. If it does it will have an exclusive range of minor numbers it doesn’t share with anything else and is free to hand out. Or it can use the misc major number. The misc major number is shared amongst different devices. However, that also means the number of minor devices that can be handed out is limited by all users of misc major. So if a user requests a very large number of binder devices in their Kconfig they might make it impossible for anyone else to allocate minor numbers. Or there simply might not be enough to allocate for itself.

Problem 2: Containers and IPC namespaces

All of those binder devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES will be allocated at boot and be placed in the hosts devtmpfs mount usually located at /dev or - depending on the udev(7) implementation - will be created via mknod(2) - by udev(7) at boot. That means all of those devices initially belong to the host IPC namespace. However, containers usually run in their own IPC namespace separate from the host’s. But when binder devices located in /dev are handed to containers (e.g. with a bind-mount) the kernel driver will not know that these devices are now used in a different IPC namespace since the driver is not IPC namespace aware. This is not a serious technical issue but a serious conceptual one. There should be a way to have per-IPC namespace binder devices.

Enter binderfs

To solve both problems we came up with a solution that I presented at the Linux Plumbers Conference in Vancouver this year. There’s a video of that presentation available on Youtube:

Android binderfs is a tiny filesystem that allows users to dynamically allocate binder devices, i.e. it allows to add and remove binder devices at runtime. Which means it solves problem 1. Additionally, binder devices located in a new binderfs instance are independent of binder devices located in another binderfs instance. All binder devices in binderfs instances are also independent of the binder devices allocated during boot specified in CONFIG_ANDROID_BINDER_DEVICES. This means, binderfs solves problem 2.

Android binderfs can be mounted via:

mount -t binder binder /dev/binderfs

at which point a new instance of binderfs will show up at /dev/binderfs. In a fresh instance of binderfs no binder devices will be present. There will only be a binder-control device which serves as the request handler for binderfs:

root@edfu:~# ls -al /dev/binderfs/
total 0
drwxr-xr-x  2 root root      0 Jan 10 15:07 .
drwxr-xr-x 20 root root   4260 Jan 10 15:07 ..
crw-------  1 root root 242, 6 Jan 10 15:07 binder-control

binderfs: Dynamically Allocating a New binder Device

To allocate a new binder device in a binderfs instance a request needs to be sent through the binder-control device node. A request is sent in the form of an ioctl(2). Here’s an example program:

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <linux/android/binder.h>
#include <linux/android/binderfs.h>

int main(int argc, char *argv[])
{
        int fd, ret, saved_errno;
        size_t len;
        struct binderfs_device device = { 0 };

        if (argc != 3)
                exit(EXIT_FAILURE);

        len = strlen(argv[2]);
        if (len > BINDERFS_MAX_NAME)
                exit(EXIT_FAILURE);

        memcpy(device.name, argv[2], len);

        fd = open(argv[1], O_RDONLY | O_CLOEXEC);
        if (fd < 0) {
                printf("%s - Failed to open binder-control device\n",
                       strerror(errno));
                exit(EXIT_FAILURE);
        }

        ret = ioctl(fd, BINDER_CTL_ADD, &device);
        saved_errno = errno;
        close(fd);
        errno = saved_errno;
        if (ret < 0) {
                printf("%s - Failed to allocate new binder device\n",
                       strerror(errno));
                exit(EXIT_FAILURE);
        }

        printf("Allocated new binder device with major %d, minor %d, and "
               "name %s\n", device.major, device.minor,
               device.name);

        exit(EXIT_SUCCESS);
}

What this program simply does is to open the binder-control device node and sending a BINDER_CTL_ADD request to the kernel. Users of binderfs need to tell the kernel which name the new binder device should get. By default a name can only contain up to 256 chars including the terminating zero byte. The struct which is used is:

/**
 * struct binderfs_device - retrieve information about a new binder device
 * @name:   the name to use for the new binderfs binder device
 * @major:  major number allocated for binderfs binder devices
 * @minor:  minor number allocated for the new binderfs binder device
 *
 */
struct binderfs_device {
       char name[BINDERFS_MAX_NAME + 1];
       __u32 major;
       __u32 minor;
};

and is defined in linux/android/binderfs.h. Once the request is made via an ioctl(2) passing a struct binder_device with the name to the kernel it will allocate a new binder device and return the major and minor number of the new device in the struct (This is necessary because binderfs allocated a major device number dynamically at boot.). After the ioctl(2) returns there will be a new binder device located under /dev/binderfs with the chosen name:

root@edfu:~# ls -al /dev/binderfs/
total 0
drwxr-xr-x  2 root root      0 Jan 10 15:19 .
drwxr-xr-x 20 root root   4260 Jan 10 15:07 ..
crw-------  1 root root 242, 0 Jan 10 15:19 binder-control
crw-------  1 root root 242, 1 Jan 10 15:19 my-binder
crw-------  1 root root 242, 2 Jan 10 15:19 my-binder1

binderfs: Deleting a binder Device

Deleting binder devices does not involve issuing another ioctl(2) request through binder-control. They can be deleted via unlink(2). This means that the rm(1) tool can be used to delete them:

root@edfu:~# rm /dev/binderfs/my-binder1
root@edfu:~# ls -al /dev/binderfs/
total 0
drwxr-xr-x  2 root root      0 Jan 10 15:19 .
drwxr-xr-x 20 root root   4260 Jan 10 15:07 ..
crw-------  1 root root 242, 0 Jan 10 15:19 binder-control
crw-------  1 root root 242, 1 Jan 10 15:19 my-binder

Note that the binder-control device cannot be deleted since this would make the binderfs instance unuseable. The binder-control device will be deleted when the binderfs instance is unmounted and all references to it have been dropped.

binderfs: Mounting Multiple Instances

Mounting another binderfs instance at a different location will create a new and separate instance from all other binderfs mounts. This is identical to the behavior of devpts, tmpfs, and also - even though never merged in the kernel - kdbusfs:

root@edfu:~# mkdir binderfs1
root@edfu:~# mount -t binder binder binderfs1
root@edfu:~# ls -al binderfs1/
total 4
drwxr-xr-x  2 root   root        0 Jan 10 15:23 .
drwxr-xr-x 72 ubuntu ubuntu   4096 Jan 10 15:23 ..
crw-------  1 root   root   242, 2 Jan 10 15:23 binder-control

There is no my-binder device in this new binderfs instance since its devices are not related to those in the binderfs instance at /dev/binderfs. This means users can easily get their private set of binder devices.

binderfs: Mounting binderfs in User Namespaces

The Android binderfs filesystem can be mounted and used to allocate new binder devices in user namespaces. This has the advantage that binderfs can be used in unprivileged containers or any user-namespace-based sandboxing solution:

ubuntu@edfu:~$ unshare --user --map-root --mount
root@edfu:~# mkdir binderfs-userns
root@edfu:~# mount -t binder binder binderfs-userns/
root@edfu:~# The "bfs" binary used here is the compiled program from above
root@edfu:~# ./bfs binderfs-userns/binder-control my-user-binder
Allocated new binder device with major 242, minor 4, and name my-user-binder
root@edfu:~# ls -al binderfs-userns/
total 4
drwxr-xr-x  2 root root      0 Jan 10 15:34 .
drwxr-xr-x 73 root root   4096 Jan 10 15:32 ..
crw-------  1 root root 242, 3 Jan 10 15:34 binder-control
crw-------  1 root root 242, 4 Jan 10 15:36 my-user-binder

Kernel Patchsets

The binderfs patchset is merged upstream and will be available when Linux 5.0 gets released. There are a few outstanding patches that are currently waiting in Greg’s tree (cf. binderfs: remove wrong kern_mount() call and binderfs: make each binderfs mount a new instancechar-misc-linus) and some others are queued for the 5.1 merge window. But overall it seems to be in decent shape.

Read more
Colin Ian King

Last year I wrote about kernel commits that are tagged with the "Fixes" tag. Kernel developers use the "Fixes" tag on a bug fix commit to reference an older commit that originally introduced the bug.   The adoption of the tag has been steadily increasing since v3.12 of the kernel:

The red line shows the number of commits per release of the kernel, and the blue line shows the number of commits that contain a "Fixes" tag.

In terms of % of commits that contain the "Fixes" tag, one can see it has been steadily increasing since v3.12 and almost 12.5% of kernel commits in v4.20 are tagged this way.

The fixes tag contains the commit SHA of the commit that was fixed, so one can look up the date of the fix and of the commit being fixed and determine the time taken to fix a bug.

As one can see, a lot of issues get fixed on the first few hundred days, and some bugs take years to get fixed.  Zooming into the first hundred days of fixes the distribution looks like:


..the modal point is at day 4, I suspect these are issues that get found quickly when commits land in linux-next and are found in early testing, integration builds and static analysis.

Out of the thousands of "Fixes" tagged commits and the time to fix an issue one can determine how long it takes to fix a specific percentage of the bugs:


In the graph above, 50% of fixes are made within 151 days of the original commit, ~69% of fixes are made within a year of the original commit and ~82% of fixes are made within 2 years.  The long tail indicates that there are some bugs that take a while to be found and fixed,  the final 10% of bugs take more than 3.5 years to be found and fixed.

Comparing the time to fix issues for kernel versions v4.0, v4.10 and v4.20 for bugs that are fixed in less than 50 days we have:


... the trends are similar, however it is worth noting that more bugs are getting found and fixed a little faster in v4.10 and v4.20 than v4.0.  It will be interesting to see how these trends develop over the next few years.

Read more
Colin Ian King

Analysis of Phoronix Test Suite Benchmarks

I've been recently investigating a wide range of bench marking tests to find suitable candidates to track down performance regressions in Ubuntu.  Over the past 3 weeks I have attempted to run the entire set of tests in the Phoronix Test Suite on a low-end Xeon server to determine a subset of reliable tests.

For benchmarks to be reliable they must show little variation in results when running the tests multiple times.  The tests also need to run in a timely manner too; waiting several days for a set of results is not very timely.

Linked here is a PDF of my set of results.  Some of the Phoronix tests are not listed as they either took way too long to complete or just didn't run successfully on the server.  Tests that have low variability in the results (that is, the standard deviation of the test runs is low compared to the average) are marked in green, high variability are marked in red.

My testing shows that a large proportion of tests have quite large variability > 5% (% standard deviation), so probably are not that trustworthy when comparing results of machines that have benchmark results that show little difference.   For regression testing, I'm going to only consider tests that have a variability of less than 2.5% as this seems like a good way to filter out the more jittery test results.

The bottom line is that some tests are just too variable to be deemed a solid benchmark.  It's always worth sanity checking tests before using them as a gold standard.

Read more
Colin Ian King

Linux I/O Schedulers

The Linux kernel I/O schedulers attempt to balance the need to get the best possible I/O performance while also trying to ensure the I/O requests are "fairly" shared among the I/O consumers.  There are several I/O schedulers in Linux, each try to solve the I/O scheduling issues using different mechanisms/heuristics and each has their own set of strengths and weaknesses.

For traditional spinning media it makes sense to try and order I/O operations so that they are close together to reduce read/write head movement and hence decrease latency.  However, this reordering means that some I/O requests may get delayed, and the usual solution is to schedule these delayed requests after a specific time.   Faster non-volatile memory devices can generally handle random I/O requests very easily and hence do not require reordering.

Balancing the fairness is also an interesting issue.  A greedy I/O consumer should not block other I/O consumers and there are various heuristics used to determine the fair sharing of I/O.  Generally, the more complex and "fairer" the solution the more compute is required, so selecting a very fair I/O scheduler with a fast I/O device and a slow CPU may not necessarily perform as well as a simpler I/O scheduler.

Finally, the types of I/O patterns on the I/O devices influence the I/O scheduler choice, for example, mixed random read/writes vs mainly sequential reads and occasional random writes.

Because of the mix of requirements, there is no such thing as a perfect all round I/O scheduler.  The defaults being used are chosen to be a good best choice for the general user, however, this may not match everyone's needs.   To clarify the choices, the Ubuntu Kernel Team has provided a Wiki page describing the choices and how to select and tune the various I/O schedulers.  Caveat emptor applies, these are just guidelines and should be used as a starting point to finding the best I/O scheduler for your particular need.

Read more
Max

Upgrading my blog theme with the Vanilla CSS Framework

Recently I have spent more and more time thinking about the usability of my website and the cleanliness of its code.

This has brought me to another iteration in its design process.

Of course also related to my employment at Canonical I have decided to give our Vanilla Framework a try.

The first thing I had to update is my gulp task to include node_modules during the build.

//  CSS
gulp.task("css", function () {
  return gulp
    .src(`${source}/assets/css/styles.scss`)
    .pipe(sourcemaps.init())
    .pipe(sass({
      includePaths: ['node_modules']
    }).on('error', sass.logError))
    .pipe(sourcemaps.write("."))
    .pipe(gulp.dest("assets/built/"));
});

Otherwise it wouldnt pick up on the imports. This is also described on the framework page.

Then I have included the basic settings from Vanilla in my styles file with

@import "vanilla-framework/scss/settings";
@import "vanilla-framework/scss/base";
@include vf-base;

The @include is necessary because vanilla provides mixins via the imported files. This way I can control exactly what I want on my blog, thus reducing bloat and data usage.

Just by doing this, the website, given its semantic HTML already looks pretty nice:

Upgrading my blog theme with the Vanilla CSS Framework

One thing that I would still like to highlight though is the main content vs the head of the webpage. After browsing through the patterns at https://docs.vanillaframework.io/en/ for a bit I decided that I am only going to need the `p-card` pattern to achieve this.

After including this with @import "vanilla-framework/scss/patterns_card"; and using the needed mixins for the default card, the highlighted one and the content inside the card with

@include vf-p-card-default;
@include vf-p-card-highlighted;
@include vf-p-card-specific-content;

I just had to add classes to my articles on the index page and the content on the post page to achieve the following:

One thing that still bothers me here is the focus of the content on the left side. So I went ahead and adjusted some of the margins to the sides and on the header.

An additional thing I added (or actually removed) afterwards is the Font. If you set the $font-base-family variable before including the Vanilla base mixin, it will replace the default Ubuntu Font. As my goal is a very data saving, content focused website, I have replace it with serif, sans-serif. This will use the Fonts that are installed on the system in that order.

So, if for any reason, there is no Font found that has [serifs](https://en.wikipedia.org/wiki/Serif), a Font without serifs will be used instead.

Resume

In a few hours I was able to reach a design that is both clear and allows the user to focus on the content, which is written text. The Vanilla Framework, and the way it is set up, made it possible for me to focus on the things that are important to me and disregard the rest.

This way I can leverage all the design decisions around readability etc. that the Vanilla team has put into the framework and still have a bloat-free, brutalisitc website design to my liking, that represents me on the Internet.

This is a great speed improvement and asset for someone like me who appreciates great design, but spends a lot of time in the backend.

Read more
Max

Upgrading my blog theme with the Vanilla CSS Framework

Recently I have spent more and more time thinking about the usability of my website and the cleanliness of its code.

This has brought me to another iteration in its design process.

Of course also related to my employment at Canonical I have decided to give our Vanilla Framework a try.

The first thing I had to update is my gulp task to include node_modules during the build.

//  CSS
gulp.task("css", function () {
  return gulp
    .src(`${source}/assets/css/styles.scss`)
    .pipe(sourcemaps.init())
    .pipe(sass({
      includePaths: ['node_modules']
    }).on('error', sass.logError))
    .pipe(sourcemaps.write("."))
    .pipe(gulp.dest("assets/built/"));
});

Otherwise it wouldnt pick up on the imports. This is also described on the framework page.

Then I have included the basic settings from Vanilla in my styles file with

@import "vanilla-framework/scss/settings";
@import "vanilla-framework/scss/base";
@include vf-base;

The @include is necessary because vanilla provides mixins via the imported files. This way I can control exactly what I want on my blog, thus reducing bloat and data usage.

Just by doing this, the website, given its semantic HTML already looks pretty nice:

Upgrading my blog theme with the Vanilla CSS Framework

One thing that I would still like to highlight though is the main content vs the head of the webpage. After browsing through the patterns at https://docs.vanillaframework.io/en/ for a bit I decided that I am only going to need the `p-card` pattern to achieve this.

After including this with @import "vanilla-framework/scss/patterns_card"; and using the needed mixins for the default card, the highlighted one and the content inside the card with

@include vf-p-card-default;
@include vf-p-card-highlighted;
@include vf-p-card-specific-content;

I just had to add classes to my articles on the index page and the content on the post page to achieve the following:

One thing that still bothers me here is the focus of the content on the left side. So I went ahead and adjusted some of the margins to the sides and on the header.

An additional thing I added (or actually removed) afterwards is the Font. If you set the $font-base-family variable before including the Vanilla base mixin, it will replace the default Ubuntu Font. As my goal is a very data saving, content focused website, I have replace it with serif, sans-serif. This will use the Fonts that are installed on the system in that order.

So, if for any reason, there is no Font found that has [serifs](https://en.wikipedia.org/wiki/Serif), a Font without serifs will be used instead.

Resume

In a few hours I was able to reach a design that is both clear and allows the user to focus on the content, which is written text. The Vanilla Framework, and the way it is set up, made it possible for me to focus on the things that are important to me and disregard the rest.

This way I can leverage all the design decisions around readability etc. that the Vanilla team has put into the framework and still have a bloat-free, brutalisitc website design to my liking, that represents me on the Internet.

This is a great speed improvement and asset for someone like me who appreciates great design, but spends a lot of time in the backend.

Read more