Canonical Voices

What Chris Halse Rogers talks about

Posts tagged with 'mir'

Christopher Halse Rogers

XMir Performance

Or: Why XMir is slower than X, and how we'll fix it

We've had a bunch of testing of XMir now; plenty of bugs, and plenty of missing functionality.

One of the bugs that people have noticed is a 10-20% performance drop over raw X. This is really several bits of missing functionality - we're doing a lot more work than we need to be. Oddly enough, people have also been mentioning that it feels "smoother" - which might be placebo, or unrelated updates, or might be to do with something in the Mir/XMir stack. It's hard to tell; it's hard to measure "smoother". We're not faster, but faster is not the same as smoother.

Currently we do a lot of work in submitting rendering from an X client to the screen, most of which we can make unnecessary.

The simple bit

The simple part is composite bypass support for Mir - most of the time unity-system-compositor does not need to do any compositing - there's just a single full-screen XMir window, and Mir just needs to flip that to the display. This is in progress. This cuts out an unnecessary fullscreen blit.

The complicated part is in XMir itself

The fundamental problem is the mismatch between rendering models - X wants the contents of buffers to be persistent; Mir has a GLish new-buffer-each-frame. This means each time XMir gets a new buffer from Mir it needs to blit the previous frame on first, and can't simply render straight to Mir's buffer. Now, we can (but don't yet) reduce the size of this blit by tracking what's changed since XMir last saw the buffer - and a lot of the time that's going to be a lot smaller than fullscreen - but there's still some overhead¹.

Fortunately, there's an way around this. GLX matches Mir's buffer semantics nicely - each time a client SwapBuffers it gets a shiny new backbuffer to render into. So, rather like Compiz's unredirect-fullscreen-windows option, if we've got a fullscreen² GLX window we can hand the buffer received from Mir directly to the client and avoid the copy.

Even better, this doesn't apply only to fullscreen games - GNOME Shell, KWin, and Unity are all fullscreen GLX applications.

As always, there are interesting complications - applications can draw on their GL window with X calls, and applications can try to be fancy and only update a part of their frontbuffer rather than calling SwapBuffers; in either case we can't bypass. Unity does neither, but Shell and KWin might.

Enter the cursor

In addition to the two unnecessary fullscreen blits - X root window to Mir buffer, Mir buffer to framebuffer - XMir currently uses X's software cursor code. This causes two problems. Firstly, it means we're doing X11 drawing on top of whatever's underneath, so we can't do the SwapBuffers trick. Secondly, it causes a software fallback whenever you move the cursor, making the driver download the root window into CPU accessible memory, do some CPU twiddling, and then upload again to GPU memory. This is bad, but not terrible, for Intel chips where the GPU and CPU share the same memory but with different caches and layouts. It's terrible for cards with discrete memory. Both these problems go away once we support setting the HW cursor image in Mir.

Once those three pieces land there shouldn't be a meaningful performance difference between XMir-on-Mir and X-on-the-hardware.

¹: If we implemented a single-buffer scheme in Mir we could get rid of this entirely at the cost of either losing vsync or blocking X rendering until vsync. That's probably not a good tradeoff.

²: Technically, if we've got a GLX client whose size matches that of the underlying Mir buffer. For the moment, that means "fullscreen", but when we do rootless XMir for 14.04 all windows will be backed by a Mir buffer of the same size.

Read more
Christopher Halse Rogers

Artistic differences

The latest entry in my critically acclaimed series on Mir and Wayland!

Wayland, Mir, and X - different projects

Apart from the architectural differences between them, which I've covered previously, Mir and Wayland also have quite different project goals. Since a number of people seem to be confused as to what Wayland actually is - and that's not unreasonable, because it's a bit complicated - I'll give a run-down as to what all these various projects are and aim to do, throwing in X11 as a reference point.

X11, and X.org

Everyone's familiar with their friendly neighbourhood X server. This is what we've currently got as the default desktop Linux display server. For the purposes of this blog post, X consists of:

The X11 protocol

You're all familiar with the X11 protocol, right? This gentle beast specifies how to talk to an X server - both the binary format of the messages you'll send and receive, and what you can expect the server to do with any given message (the semantics). There are also plenty of protocol extensions; new messages to make the server do new things, like handle more than one monitor in a non-stupid way.

The X11 client libraries

No-one actually fiddles with X by sending raw binary data down a socket; they use the client libraries - the modern XCB, or the boring old Xlib (also known as libX11.so.6). They do the boring work of throwing binary data at the server and present the client with a more civilised view of the X server, one where you can just XOpenDisplay(NULL) and start doing stuff.
Actually, the above is a bit of a lie. Almost all the time people don't even use XCB or Xlib, they use toolkits such as GTK+ or Qt, and they use XLib or XCB.

The Xorg server

This would be the bit most obviously associated with X - the one, the only, the X.org X server! This is the actual /usr/bin/X display server we all know and love. Although there are other implementations of X11, this is all you'll ever see on the free desktop. Or on OS X, for that matter.

So that's our baseline stack - a protocol, one or more client libraries, a display server implementation. How about Wayland and Mir?

The Wayland protocol

The Wayland protocol is, like the X11 protocol, a definition of the binary data you can expect to send and receive over a Wayland socket, and the semantics associated with those binary bits. This is handled a bit differently to X11; the protocol is specified in XML, which is processed by a scanner and turned into C code. There is a binary protocol, and you can technically¹ implement that protocol without using the wayland-scanner-generated code, but it's not what you're expected to do.

Also different from X11 is that everything's treated as an extension - you deal with all the interfaces in the core protocol the same way as you deal with any extensions you create. And you create a lot of extensions - for example, the core protocol doesn't have any buffer-passing mechanism other than SHM, so there's an extension for drm buffers in Mesa. The Weston reference compositor also has a bunch of extensions, both for ad-hoc things like the compositor<->desktop-shell interface, and for things like XWayland.

The Wayland client library

Or libwayland. A bit like XCB and Xlib, this is basically just an IPC library. Unlike XCB and Xlib it can also used by a Wayland server for server→client communication. Also unlike XCB and Xlib, it's programmatically generated from the protocol definition. It's quite a nice IPC library, really. Like XCB and Xlib, you're not really expected to use this, anyway; you're expected to use a toolkit like Qt or GTK+, and EGL + your choice of Khronos drawing API if you want to be funky.
There's also a library for reading X cursor themes in there.

The Wayland server?

This is where it diverges; there is no Wayland server in the sense that there's an Xorg server. There's Weston, the reference compositor, but that's strictly intended as a testbed to ensure the protocol works.

Desktop environments are expected to write their own Wayland server, using the protocol and client libraries.

The Mir protocol?

We kinda have an explicit IPC protocol, but not really. We don't intend to support re-implementations of the Mir client libraries, and will make no effort to not break them if someone tries. We're using Google Protobuf for our IPC format, by the way.

The Mir client libraries

What toolkit writers use; it's even called mir_toolkit. Again, you're probably not going to use this directly; you're going to use a toolkit like GTK+ or Qt, and like Wayland if you want to draw directly you'll be using EGL + GL/GLES/OpenVG.

The Mir server?

Kind of. Where the Wayland libraries are all about IPC, Mir is about producing a library to do the drudge work of a display-server-compositor-thing, so in this way it's more like Xorg than Wayland. In fact, it's a bit more specific - Mir is about creating a library to make the most awesome Unity display-server-compositor-thingy. We're not aiming to satisfy anyone's requirements but our own. That said, our requirements aren't likely to be hugely specific, so Mir will likely be generally useful.

To some extent this is why GNOME and KDE aren't amazingly enthused about Mir. They already have a fair bit of work invested in their own Wayland compositors, so a library to build display-server-compositor-thingies on isn't hugely valuable to them. Right now.

Perhaps we'll become so awesome that it'll make sense for GNOME or KDE to rebase their compositors on Mir, but that's a long way away.

¹: Last time I saw anyone try this on #wayland there were problems around the interaction with the Mesa EGL platform which meant you couldn't really implement the protocol without using the C existing library. I'm not sure if that got resolved.

Read more
Christopher Halse Rogers

Server Allocated Buffers in Mir

…Or possibly server owned buffers

One of the significant differences in design between Mir and Wayland compositors¹ is the buffer allocation strategy.

Wayland favours a client-allocated strategy. In this, the client code asks the graphics driver for a buffer to render to, and then renders to it. Once it's done rendering it gives a handle² to this buffer to the Wayland compositor, which merrily goes about its job of actually displaying it to the screen.

In Mir we use a server-allocated strategy. Here, when a client creates a surface, Mir asks the graphics driver for a set of buffers to render to. Mir then sends the client a reference to one of the buffers. The client renders away, and when it's done it asks Mir for the next buffer to render to. At this point, Mir sends the client a reference to another buffer, and displays the first buffer.

You can see this in action - in the software-rendered case - in demo_client_unaccelerated.c (I know the API here is a bit awkward; I'm agitating for a better one).
The meat of it is:


mir_wait_for(mir_surface_next_buffer(surface, surface_next_callback, 0));
which asks the server for the next buffer and blocks until the server has replied. This also tells the server that the current buffer is available to display.
Then

mir_surface_get_graphics_region(surface, &graphics;_region);
gets a CPU-writeable block of memory from the current buffer. You'll note that we don't have a mir_wait_for around this; that's because we've already got the buffer from the server above; this just accesses it.

The code then writes an interesting or, in fact, highly boring pattern to the buffer and loops back to next_buffer to display that rendering and get a new buffer, ad infinitum.

Great. So why are you doing it, again?

The need for server-allocated buffers is primarily driven by the ARM world and Android graphics stack, an area in which I'm blissfully unaware of the details. So, as with my second-hand why Mir post, you, gentle reader, get my own understanding.

The overall theme is: server-allocated buffers give us more control over resource consumption.

ARM devices tend to be RAM constrained, while at the same time having higher resolution displays than many laptops. Applications also tend to be full-screen. Window buffers are on the order of 4 MB for mid-range screens, 8 MB for the fancy 1080p displays on high end phones, and 16 MB for the Nexus 10s. Realistically you need to at least double-buffer, and it's often worth triple-buffering, so you're eating 8-12MB on the low end and 32-48MB on the high end, per window. Have a couple of applications open and this starts to add up on devices with between 512MB and 2GB of RAM and no dedicated VRAM.

In addition to their lack of dedicated video memory, ARM GPUs also tend to be much less flexible. On a desktop GPU you can generally ask for as many scanout-capable³ buffers as you like. ARM GPUs tend to have several restrictions on scanout buffers. Often they need to be physically contiguous, which in practice means allocated out of a limited block of memory reserved at boot. Sometimes they can only scanout of particular physical addresses. I'm sure there are other oddities out there.

So scanout buffers are a particularly precious resource. However, you really want clients to be able to use scanout-capable buffers - in the common case where you've got a single, opaque, fullscreen window, you don't have to do any compositing, so having the application draw directly to the scanout buffer saves you a copy; a significant performance win. Desktop aficionados might recognise this as the same effect as Compiz's unredirect fullscreen windows option.

So, there's the problem domain. What do server-allocated buffers buy us?

The server can control access to the limited scanout buffers.
This one's pretty self-explanatory.

The server can steal buffers from inactive clients.
When applications aren't visible, they don't really need their window buffers, do they? The server can either destroy the inactive client's buffers, or clear them and hand them off to a different application.

When applications are idle for an extended period we'll want to clear up more resources - maybe destroy their EGL state, and eventually kill them entirely. However, applications will be able to resume faster if we've just killed their window buffers than if we've killed their whole EGL state.

The server can easily throttle clients.
There's an obvious way for the server to prevent a client from rendering faster than it needs to; delay handing out the next buffer.

Finally:

Having a single allocation model is cleaner.
It's cleaner to have just one allocation model. We need to do at least some server-allocation, so that's the one allocation model. Plus, even though desktops have lots more memory than phones, people still care about memory usage ☺.

But doesn't X do server-allocated buffers, and it is terrible?

Yes. But we're not doing most of the things that make server-allocated buffers terrible. In X's case
  • X allocates all sorts of buffers, not just window surfaces. We leave the client to allocate pixmap-like things - that is, GL textures, framebuffer objects, etc - itself. Notably, we're not allocating ancillary buffers in the server, just the colour buffers. We shouldn't need to change protocol to support new types of buffers (HiZ springs to mind), like has happened with DRI2.
  • X isn't particularly integrated. When you resize an X window, the client gets an DRI2 Invalidate event, indicating that it needs to request new buffers. It also gets a separate ConfigureNotify event about the actual resize. We won't have this duplication; clients will automatically get buffers of the new size on the next buffer advance, and these buffers know their size.
  • X has more round-trips than necessary. After submitting rendering with SwapBuffers, clients receive an Invalidate event. They then need to request new, valid buffers. We won't have as many roundtrips; the server responds to a request to submit the current rendering with the new buffer.

A possibly significant disadvantage we can't mitigate is that the server no longer knows how big the client thinks its window is, so we may display garbage in the window if the client thinks its window is smaller than the server's window. I'm not sure how much of a problem that will be in practise.

Summary

Many of the benefits of server-allocation could also be gained with sufficiently smart clients and some extra protocol messages - although that falls down once you start suspending idle clients, as it requires the client to actually be running. We think that doing it in the server will be easier, cleaner, and more reliable.

I'm less convinced that server-allocation is the right choice than the rest of Mir's architecture. I'm not convinced that it's a mistake, either, and some of the benefits are attractive. We'll see how it goes!


¹: Yes, I know you can write a server-allocated buffer model with a Wayland protocol. All the Wayland compositors for which we have source, however, use a client-allocated model, and that's what existing Wayland client code expects. Feel free to substitute “existing Wayland compositors” whenever I say “Wayland”.

²: Actually, in the common (GPU-accelerated) case the client itself basically only gets a handle to the buffer; the GEM handle, which it gets from the kernel.

³: scanout is the process of reading from video memory and sending to an output encoder, like HDMI or VGA. A scanout-capable buffer is the only thing the GPU is capable of putting on a physical display.

Read more
Christopher Halse Rogers

For posterity

This is based on my series of G+ posts

Standing on the shoulders of giants

We've recently gone public (yay!) with the Mir project that we've been working on for some months now.

It's been a bit rockier than I'd hoped (boo!). Particularly, we offended people with incorrect information the wiki page we wanted to direct the inevitable questions to.

I had proof-read this, and didn't notice it - I'm familiar with Wayland, so even with “X's input has poor security” and “Wayland's input protocol may duplicate some of the problems of X” juxtaposed I didn't make the connection. After all, one of the nice things of Wayland is that it solves the X security problems! It was totally reasonable to read what was written as “Wayland's input protocol will be insecure, like X's” which is totally wrong; sorry to all concerned for not picking that up, most especially +Kristian Høgsberg and +Daniel Stone.

Now that the mea-culpa's out of the way…

Although we've got a section on the wiki page “why not Wayland/Weston” there's a bunch of speculation around about why we really created Mir, ranging from the sensible (we want to write our own display serve so that we can control it) - to the not-so-sensible (we're actually a front company of Microsoft to infiltrate and destroy Linux). I don't think the rationale on the page is inaccurate, but perhaps it's not clear.


Why Mir?


Note: I was not involved in the original decision to create Mir rather than bend Wayland to our will. While I've had discussions with those who were, this is filtered through my own understanding, so treat this as my interpretation of the thought-processes involved. Opinions expressed do not necessarily reflect the opinions of my employer, etc.


We wanted to integrate the shell with a display server

There are all sorts of frustrations involved in writing a desktop shell in X. See any number of Wayland videos for details :).

We therefore want Wayland, or something like it.


We don't want to use Weston (and neither does anyone else)

Weston, the reference Wayland compositor, is a test-bed. It's for the development of the Wayland protocol, not for being an actual desktop shell. We could have forked Weston and bent it to our will, but we're on a bit of an automated-testing run at the moment, and it's generally hard to retro-fit tests onto an existing codebase. Weston has some tests, but we want super-awesome-tested code.

It's perhaps worth noting that neither GNOME nor KDE have based their Wayland compositor work on Weston.

We don't want Weston, but maybe we want Wayland?


What about input?

At the time Mir was started, Wayland's input handling was basically non-existent. +Daniel Stone's done a lot of work on it since then, but at the time it would have looked like we needed to write an input stack.

Maybe we want Wayland, but we'll need to write the input stack.


We want server-side buffer allocation; will that work?

We need server-side buffer allocation for ARM hardware; for various reasons we want server-side buffer allocation everywhere. Weston uses client-side allocation, and the Wayland EGL platform in Mesa does likewise. Although it's possible to do server-side allocation in a Wayland protocol, it's swimming against the tide.

Maybe we want Wayland, but we'll need to write an input stack and patch XWayland and the Mesa EGL platform.


Can we tailor this to Unity's needs?

We want the minimum possible complexity; we ideally want something tailored exactly to our requirements, with no surplus code. We want different WM semantics to the existing wl_shell and wl_shell_surface, so we ideally want to throw them away and replace them with something new.

Maybe we want Wayland, but we'll need to write an input stack, patch XWayland and the Mesa EGL platform, and redo the WM handling in all the toolkits.


So, does it make sense to write a Wayland compositor?

At this point, it looks like we want something like Wayland, but different in almost all the details. It's not clear that starting with Wayland will save us all that much effort, so the upsides of doing our own thing - we can do exactly and only what we want, we can build an easily-testable code base, we can use our own infrastructure, we don't have an additional layer of upstream review - look like they'll outweigh the costs of having to duplicate effort. 

Therefore, Mir.

Read more
Christopher Halse Rogers

Mir and YOU!

This is still based on my series of G+ posts

But Chris! I don't care about Unity. What does Mir mean for me?



The two common concerns I've seen on my G+ comment stream are:


  • With Canonical focusing on Mir rather than Wayland, what does this mean for GNOME/Kubuntu/Lubuntu? What about Mint?


  • Does this harm other distros by fragmenting the Linux driver space?



  • What does this mean for GNOME/Kubuntu/etc?


    The short answer, for the short-to-mid-term is: not much.

    We'll still be keeping X available and maintained. Even after we shove a Mir system-compositor underneath it there will still be an X server available for sessions.

    Let's review the architecture diagram from MirSpec:

    Here we see a single Mir system compositor - the top box - and three clients of the system compositor: a greeter (leftmost box) and two Unity-Next user sessions.

    You can replace any of the Mir boxes which contain the sessions - “Unity Next” - with an X server + everything you'd normally run on the X server. The top “System Compositor/Shell” box is just there to support the user sessions - to handle the transitions startup splash → greeter, greeter → user session, user session → user session, and user session → shutdown splash. We'll probably also use the system compositor to provide a proper secure authentication mechanism, too.

    Note that the “Unity Next” boxes will also be running an X server, but in this case it will be a “rootless” X server. Basically, that means that it doesn't have a desktop window, just free standing application windows. This will be how we support legacy X11 apps in a Unity Next session. This is also how Wayland compositors, such as Weston, handle X11 compatibility - and, for that matter, how X11 works on OSX.

    The rootless X server in Unity Next will not be able to run KWin or Mutter or whatever. This isn't losing anything, though - you can't currently run KWin or GNOME Shell in Unity, or visa versa, even though everything runs on X.

    So, if the short-term impact is “approximately none”, how about the long-term impact?



    This somewhat depends on what other projects do. Remember how we can run full X servers under the System Compositor? We can do the same with Weston. Weston has multiple backends; you can already run Weston-on-Wayland, just as you can run Weston-on-X11. Once we've got input sorted out in Mir it'll probably be a fun weekend hack to add a Mir backend to Weston, and have Weston-on-Mir. You're welcome to do that if you get to it before me, gentle reader ☺.

    So what happens if KDE or GNOME build a Wayland compositor? Well, we don't really know, because they haven't finished yet¹. Neither of them are using Weston, so a Weston-on-Mir backend won't be terribly useful. If their compositor architecture allows multiple backends then writing a Mir backend for them shouldn't be terrible; in that case, we can replace any of the “Unity Next” sessions in the diagram with “GNOME Shell Next” or “KDE Next”, collect our winnings, and drive off into the sunset in our shiny new Ferrari.

    The result that requires the most work is if KDE or GNOME additionally build a system compositor. This seems quite likely, as a system compositor makes a bunch of knarly problems go away. In that case you'd not only need to write a Mir backend for their compositor, you'd also need to implement whatever interfaces they use on their system compositor.

    We see this a bit already, actually, with GNOME Shell and GDM; Shell uses interfaces which LightDM doesn't provide but GDM does, so things break.

    The worst-case scenario here is that you need GNOME's stack to run GNOME, KDE's stack to run KDE, and Unity's stack to run Unity, and you need to select between them at boot rather than at the greeter.

    So, in summary: Mir doesn't break GNOME/KDE/XFCE now. These projects may change in a way that's incompatible with Mir in future, but that's (a) in the future and (b) solvable.

    Does this fragment the Linux graphics driver space?



    Ish.

    Mir only exists because of all the work Wayland devs have done around the Linux graphics stack. It uses exactly the same stack that Weston's drm backend does².

    The XMir drivers are basically the same as the XWayland drivers - they're stock xf86-video-intel, xf86-video-ati, xf86-video-nouveau with a small patch. They're not the same, but they're not hugely different.

    The Mir EGL platform looks almost exactly the same as the Wayland EGL platform. Which looks almost exactly the same as the GBM EGL platform, which looks almost exactly the same as the X11 EGL platform. Code reuse and interfaces are awesome - we might be submitting patches to share even more code here.

    Weston currently doesn't have any proprietary driver support; similarly, neither does Mir.

    We're talking with NVIDIA and AMD to get support for running Mir on their proprietary drivers, and providing an interface for proprietary drivers in general. It's early days; we don't have anything concrete; and even if we did, I probably wouldn't be able to divulge it. But it's likely that whatever we come up with to support Mir on NVIDIA will also support Wayland on NVIDIA.

    So, driver divergence - real, but probably overblown.

    ¹: There's an old GNOME Shell branch that's seeing some love recently, and KDE has a kwin branch for wayland support.
    ²: On the desktop. On Android, it uses the Android stack.

    Read more