Canonical Voices

Posts tagged with 'planet-ubuntu'


This is the first article in a series of blog posts on Mir’s and XMir’s performance. The idea is to provide further insights into the overall performance work, point out existing bottlenecks and how the team is addressing them.

Our overall goal for Mir and XMir is to provide an absolutely fluid user experience, both in the case of typical desktop usage as well as in the case of more demanding usage scenarios like 3D gaming. More to this, our efforts to provide a fluent user-experience on the desktop should at most have a minimal impact on overall 3D application performance.

During the last weeks and months, a lot of people have raised the question if and to what degree the introduction of a system-level compositor impacts graphical performance. The short answer is: Yes, any additional layer between the GPU and the actual rendering process has an impact on the overall performance characteristics of the system. However, there are ways to avoid most of the overhead and this blog post is the not-so-short answer to the initial question.

As its name implies, a compositor is responsible for taking multiple buffer streams or surfaces and assembling (a.k.a. compositing) a final image that is then scanned out to the connected monitors. In the general case, composition requires buffering of the final image and it requires GPU resources to render the individual surfaces to the destination buffer in preparation for scanout. Here, the destination buffer is the framebuffer. The overhead of a system-level compositor can be summarized as this additional rendering step in the overall graphic pipeline, for the obvious benefit of being able to control the final output and enabling flicker-free boot, shutdown, resume, suspend and session-switching.

Both internally and externally, people have been measuring the overall performance impact with XMir as available from the archive today. Roughly speaking, people have been reporting a performance impact of ~20% in the Phoronix test suite and the question becomes: How can we significantly decrease the impact in the specific case of XMir while still keeping all the aforementioned benefits in place? The underlying idea to solve the issue is straightforward: If the compositor is clever enough, it could recognize situations where an opaque client surface does cover a complete output (XMir matches exactly this configuration). In that case, composition can be avoided and the client should be provided with a framebuffer as rendering target instead of the usual graphic memory  buffer. Moreover, the server-side composition strategy can be smart, and completely skip the final composition step and scan out the framebuffer as soon as the client signals “done”. Luckily, Mir’s composition engine and associated buffer allocation/swapping infrastructure allows for implementing this behavior easily and transparently to the client. The respective implementation has been living in for some time now, and we have been testing it in parallel to trunk. Our primary test and benchmarking platform was Intel, and we haven’t seen any issues with the patch on that platform. There is a graphical glitch present on ATI cards that we are actively working on. Nouveau gives us some headache as it is quite slow both on X and XMir right now. However, we are confident that we won’t see any major issues in XMir once the underlying cause in the Nouveau driver is fixed.


Measuring graphical performance and developing meaningful benchmarks is a complex task on its own. Luckily, we have some pretty capable tooling available in the opensource world. During development and evaluation of the bypass feature, we have been relying on selected test-cases of Phoronix Test Suite and on glmark2 to continuously evaluate performance gains and overall impact. We are going to publish the results across Intel, NVIDIA and AMD GPUs as part of our regular QA reporting at as soon as we hit trunk. In summary, we are able to reduce XMir’s total overhead to ~6% on Nexuiz and OpenArena (see section “Conclusions and Future Work” for reasons for and approaches to further reduce the remaining overhead). Please also note that we are actively investigating into the results for the “QGears2: OpenGL + Image scaling” test case:

GUI Toolkits - Intel 2500 GUI Toolkits - Intel 3000 GUI Toolkits - Intel 4000 Nexuiz HDR Off Nexuiz HDR On OpenArena

GLMark2 numbers are not yet reported via the public dashboard but we are actively working on wiring them up as part of our daily quality efforts, too. However, the numbers are quite promising as can be seen from this preview (Lenovo x220, i7 vPro, Intel(R) HD Graphics 3000):


Conclusions & Future Work

Today, we are landing an important GPU-bound optimization for the XMir use-case with the bypass feature and we see significant performance improvements in our benchmarking scenarios. Everyday users will hardly notice any difference in graphical performance, but notice a decrease in power usage on laptops due to the system-compositor requiring less GPU and CPU cycles to carry out its tasks.

However, this is only the first step and we still see some overhead in the benchmarks. Our GLMark2 benchmark numbers for raw Mir when compared to X as in Saucy today suggest that we still have GPU-bound optimization potential that we should leverage in the XMir case. The unity-system-compositor performance is not the bottleneck in this specific scenario and we need to become more clever on the X side of things. In summary, we need to propagate the bypass approach further down into the X world and its clients with X/Compiz handing out the raw buffer provided by Mir to fullscreen, opaque X clients. Luckily, Compiz already knows about the notion of composite bypass, too and the remaining optimization potential lies mostly within X itself by making it more aware of the fact that it is living in a world of nested compositors now. Quite likely, though, Mir will require adjustments, too, to expose composition bypass end-to-end in the XMir scenario. Stay tuned, we will keep you posted within this series of blog posts.

[Update] Michael of Phoronix found out that some games, when run in fullscreen mode but not at native resolution, do not benefit from composition bypass. As mentioned in one of the comments, we are now starting to investigate into this sort of issues and will come back with updates once we identified the root causes. At any rate: Thanks for bringing it up, we will make sure that the respective benchmark/setup is present in our benchmarking setup, too.

Filed under: Canonical, planet-ubuntu, Quality, Technology, Ubuntu, Uncategorized

Read more

To put it straight: A user should be able to invoke an arbitrary number of applications and experience neither a slow-down nor ever have to wonder whether an app is already running. It is just there whenever it is invoked and the system takes care of the rest.

When we started working on Ubuntu Touch roughly a year ago, our primary focus was the enablement of central HW components such that Ubuntu would be able to run on common mobile form factors. However, after having reached the goal of being able to leverage HW acceleration for UI, media decoding and accessing the on-board sensors, we started thinking about our application model that we wanted to deeply integrate with the OS. From a user’s and a developer’s perspective, our primary goals are:

  • Provide a consistent application model that spans installation, execution and de-installation of apps.
  • Ensure security at all stages and account for the fact that apps have to be considered harmful.
  • Ensure a seamless multi-tasking experience that is transparent to the user and does not require to think in terms of running/not running.
  • Make the application model as easy to develop with as possible.

From the system’s point of view, our objectives are:

  • Integrate a well-defined confinement model deep within the system.
  • Enable the system to aggressively control the resource consumption of apps.
  • Enable a seamless transition to the converged world.

Every single objective listed before is a challenge on its own. On top, they are interdependent and even conflicting at times. However, one of the most fundamental building blocks of the overall application model is the application lifecycle and this blog post  is dedicated to explaining both our lifecycle model and policies. From a user’s perspective

A mobile device is an environment offering a limited set of computing resources, i.e., CPU cycles, main memory, GPU cycles, graphics memory and power. Running applications are competing for these resources and we have to assume that applications are greedy, trying to use as much of the available resources as possible. The user expects the system to ensure a fluent user experience while providing the longest possible battery life at the same time. On top of this, the user should not be required to carry out any sort of process management tasks or to build a mental model of the different run states an app can be in. Ultimately, the application lifecycle should be completely transparent to the user and multi-tasking should be seamless.

A solution to the problem needs to satisfy the following additional constraints:

  1. The application lifecycle model should be easy to develop with. That is, the changes to the well-known process state machine should be as small as possible, providing sensible and robust fallback behavior.
  2. As Ubuntu is working towards a converged world, the lifecycle model needs to be adaptable to a range of different scenarios: From mobile phones, over tablets to classic desktop environments. The differences should be transparent to the developer and both applications and the overall system need to be able to transition seamlessly from one use-case to the other. This is especially important when thinking about the Ubuntu Edge, with the phone being a full-featured desktop/laptop replacement when docked or connected to a large screen.

The Application Lifecycle Model

As noted earlier, one of our goals is the ability to define different lifecycle policies and swap them out dynamically at runtime to account for different usage scenarios. We want to minimize the impact on developers when moving to a converged world and make lifecycle policy changes and decisions transparent to a user and a developer alike. To this end, we clearly separate the application lifecycle model and the policies that the system executes on top of it. Our current model we are putting in place extends on the well-known process state machine as presented in the following diagram:

Draft- Application Model (1)

The states are defined as:

  • Focused: The application is visible to the user and guaranteed to be running and provided with all necessary resources.
  • Unfocused: The application is not guaranteed to be running, i.e., it might not be granted CPU or GPU cycles and the policy is free to trigger a state transition to any of the not-running states. In the phone scenario, the app is not visible to the user.
  • Killed: The app’s process image has been removed from main memory.
  • Stopped: The app’s process is sigstop’d.
  • Stateless: The app’s process has been sigkill’d without prior state preservation. Serves as a way to reset an app’s state.

A “transparent” application lifecycle then translates to: Ensure that applications are able to preserve (and subsequently recreate) their state before being transitioned to the “not-running” meta state. This is indicated by dashed state transitions in the diagram. All of these transitions are preceeded by a notification to the app that it is about to be stopped or killed, handing over an archive file that the app can serialize its state to during a grace period. After that, the app is actually transitioned to “not running”. When the app is resurrected, the system provides the archive back to the app and the app recreates its previous state. In the diagram, an interesting aspect becomes visible: As we want to enable lifecycle policies to kill a stopped app, an application needs to preserve its state even if only being sig-stop’ed.

Application Lifecycle Policies

Based on the application lifecycle model presented before we can now start defining policies triggering the state transitions. Classical desktop behavior can be easily expressed in this model, too. The current desktop lifecycle policy never automatically triggers a state transition from the running to the “not-running” state and does not limit resources granted to an app when unfocused.

For version 1 on the phone (only considering the non-converged, standalone phone scenario) we are defining a very strict lifecycle policy. All non-focused apps are not guaranteed to stay in the “running” state and are transitioned to the “not running” state at the policies discretion to aggressively save resources. Today, we are already sigstop’ing app processes whenever they are unfocused and we will go even further and kill unfocused apps when we detect memory pressure (even before the OOM potentially kicks in).

Why are we so strict? We as a platform take on the responsibility to manage the scarce resources of a mobile device as efficiently as possible. We don’t want to leave it up to reviewers or users to identify and capture resource hogging apps. However, this is only version 1 and we consciously decided for a very conservative approach that we can open up gradually going forward as opposed to starting without a clear policy and taking away functionality over time.

Implications of a Strict Lifecycle Policy

Both internally and externally, a lot of discussions have been triggered by the strict lifecycle policy for version 1. We have discussed a multitude of different use-cases and almost all of them are solvable by means of separating apps into an engine and into a UI part. First, separating application logic from the presentation layer is good practice in software design. Second, relying on an engine/background service allows apps to escape the lifecycle “trap” easily by dispatching to an entity that exceeds an apps lifetime as dictated by our lifecycle policy. A nice side-effect is easily testable code.

How does work for version 1 in Ubuntu Touch? In summary, the system will provide a set of system services that cover the most prominent examples and use-cases identified from both external & internal discussions, e.g.:

  • Music playback in the background
  • Downloads happening in the background
  • Alarms/appointments

In this 1st version, apps will not be able to install their own background services/engines in the default setup. However, going forward in time, we will provide a mechanism for apps to hand their engine to the system and have it executing in the background (with resource constrains in place, though).

For readers interested in more details:

Filed under: planet-ubuntu, Ubuntu

Read more