Canonical Voices

Posts tagged with 'technology'

ThomasVo5

This is the first article in a series of blog posts on Mir’s and XMir’s performance. The idea is to provide further insights into the overall performance work, point out existing bottlenecks and how the team is addressing them.

Our overall goal for Mir and XMir is to provide an absolutely fluid user experience, both in the case of typical desktop usage as well as in the case of more demanding usage scenarios like 3D gaming. More to this, our efforts to provide a fluent user-experience on the desktop should at most have a minimal impact on overall 3D application performance.

During the last weeks and months, a lot of people have raised the question if and to what degree the introduction of a system-level compositor impacts graphical performance. The short answer is: Yes, any additional layer between the GPU and the actual rendering process has an impact on the overall performance characteristics of the system. However, there are ways to avoid most of the overhead and this blog post is the not-so-short answer to the initial question.

As its name implies, a compositor is responsible for taking multiple buffer streams or surfaces and assembling (a.k.a. compositing) a final image that is then scanned out to the connected monitors. In the general case, composition requires buffering of the final image and it requires GPU resources to render the individual surfaces to the destination buffer in preparation for scanout. Here, the destination buffer is the framebuffer. The overhead of a system-level compositor can be summarized as this additional rendering step in the overall graphic pipeline, for the obvious benefit of being able to control the final output and enabling flicker-free boot, shutdown, resume, suspend and session-switching.

Both internally and externally, people have been measuring the overall performance impact with XMir as available from the archive today. Roughly speaking, people have been reporting a performance impact of ~20% in the Phoronix test suite and the question becomes: How can we significantly decrease the impact in the specific case of XMir while still keeping all the aforementioned benefits in place? The underlying idea to solve the issue is straightforward: If the compositor is clever enough, it could recognize situations where an opaque client surface does cover a complete output (XMir matches exactly this configuration). In that case, composition can be avoided and the client should be provided with a framebuffer as rendering target instead of the usual graphic memory  buffer. Moreover, the server-side composition strategy can be smart, and completely skip the final composition step and scan out the framebuffer as soon as the client signals “done”. Luckily, Mir’s composition engine and associated buffer allocation/swapping infrastructure allows for implementing this behavior easily and transparently to the client. The respective implementation has been living in https://code.launchpad.net/~vanvugt/mir/bypass for some time now, and we have been testing it in parallel to trunk. Our primary test and benchmarking platform was Intel, and we haven’t seen any issues with the patch on that platform. There is a graphical glitch present on ATI cards that we are actively working on. Nouveau gives us some headache as it is quite slow both on X and XMir right now. However, we are confident that we won’t see any major issues in XMir once the underlying cause in the Nouveau driver is fixed.

Results

Measuring graphical performance and developing meaningful benchmarks is a complex task on its own. Luckily, we have some pretty capable tooling available in the opensource world. During development and evaluation of the bypass feature, we have been relying on selected test-cases of Phoronix Test Suite and on glmark2 to continuously evaluate performance gains and overall impact. We are going to publish the results across Intel, NVIDIA and AMD GPUs as part of our regular QA reporting at http://reports.qa.ubuntu.com/graphics/ as soon as we hit trunk. In summary, we are able to reduce XMir’s total overhead to ~6% on Nexuiz and OpenArena (see section “Conclusions and Future Work” for reasons for and approaches to further reduce the remaining overhead). Please also note that we are actively investigating into the results for the “QGears2: OpenGL + Image scaling” test case:

GUI Toolkits - Intel 2500 GUI Toolkits - Intel 3000 GUI Toolkits - Intel 4000 Nexuiz HDR Off Nexuiz HDR On OpenArena

GLMark2 numbers are not yet reported via the public dashboard but we are actively working on wiring them up as part of our daily quality efforts, too. However, the numbers are quite promising as can be seen from this preview (Lenovo x220, i7 vPro, Intel(R) HD Graphics 3000):

x_vs_xmir_vs_bypass_fps

Conclusions & Future Work

Today, we are landing an important GPU-bound optimization for the XMir use-case with the bypass feature and we see significant performance improvements in our benchmarking scenarios. Everyday users will hardly notice any difference in graphical performance, but notice a decrease in power usage on laptops due to the system-compositor requiring less GPU and CPU cycles to carry out its tasks.

However, this is only the first step and we still see some overhead in the benchmarks. Our GLMark2 benchmark numbers for raw Mir when compared to X as in Saucy today suggest that we still have GPU-bound optimization potential that we should leverage in the XMir case. The unity-system-compositor performance is not the bottleneck in this specific scenario and we need to become more clever on the X side of things. In summary, we need to propagate the bypass approach further down into the X world and its clients with X/Compiz handing out the raw buffer provided by Mir to fullscreen, opaque X clients. Luckily, Compiz already knows about the notion of composite bypass, too and the remaining optimization potential lies mostly within X itself by making it more aware of the fact that it is living in a world of nested compositors now. Quite likely, though, Mir will require adjustments, too, to expose composition bypass end-to-end in the XMir scenario. Stay tuned, we will keep you posted within this series of blog posts.

[Update] Michael of Phoronix found out that some games, when run in fullscreen mode but not at native resolution, do not benefit from composition bypass. As mentioned in one of the comments, we are now starting to investigate into this sort of issues and will come back with updates once we identified the root causes. At any rate: Thanks for bringing it up, we will make sure that the respective benchmark/setup is present in our benchmarking setup, too.


Filed under: Canonical, planet-ubuntu, Quality, Technology, Ubuntu, Uncategorized

Read more
Robbie

20130722092922-edge-1-large

So I’m partially kidding…the Ubuntu Edge is quickly becoming a crowdfunding phenomena, and everyone should support it if they can.  If we succeed, it will be a historic moment for Ubuntu, crowdfunding, and the global phone industry as well. 

But I Don’t Wanna Talk About That Right Now

While I’m definitely a fan of the phone stuff, I’m a cloud and server guy at heart and what’s gotten me really excited this past month have been two significant (and freaking awesome) announcements.

#1 The Juju Charm Championship

easy_money_board_game_1974_121011_1

First off, if you still don’t know about Juju, it’s essentially our attempt at making Cloud Computing for Human Beings.  Juju allows you to deploy, connect, manage, and scale web services and applications quickly and easily…again…and again…AND AGAIN!  These services are captured in what we call charms, which contain the knowledge of how to properly deploy, configure, connect, and scale the services and applications you will want to deploy in the cloud.  We have 100′s of charms for every popular and well-known web service and application in use in the cloud today.  They’ve been authored and maintained by the experts, so you don’t have to waste your time trying to become one.  Just as Ubuntu depends on a community of packagers and developers, so does Juju.  Juju goes only as far as our Charm Community will take us, and this is why the Charm Championship is so important to us.

So….what is this Charm Championship all about?  We took notice of the fantastic response to the Cloud-Prize contest ran by our good friends (and Ubuntu Server users) over at Netflix.  So we thought we could do something similar to boost the number of full service solutions deployable by Juju, i.e. Charm Bundles.  If charms are the APT packages of the cloud, bundles are effectively the package seeds, thus allowing you to deploy groups of services, configured and interconnected all at once.  We’ve chosen this approach to increase our bundle count because we know from our experience with Ubuntu, that the best approach for growth will be by harvesting and cultivating the expertise and experience of the experts regularly developing and deploying these solutions.  For example, we at Canonical maintain and regularly deploy an OpenStack bundle to allow us to quickly get our clouds up for both internal use and for our Ubuntu Advantage customers.  We have master level expertise in OpenStack cloud deployments, and thus have codified this into our charms so that others are able to benefit.  The Charm Championship is our attempt to replicate this sharing of similar master level expertise across more service/application bundles…..BY OFFERING $30,000 USD IN PRIZE MONEY! Think of how many Ubuntu Edge phones that could buy you…well, unless you desperately need to have one of the first 50 :-).

#2 JujuCharms.com

Ironman JARVIS technologyFrom the very moment we began thinking about creating Juju years ago…we always envisioned eventually creating an interface that provides solution architects the ability to graphically create, deploy, and interact with services visually…replicating the whiteboard planning commonly employed in the planning phase of such solutions.

The new Juju GUI now integrated into JujuCharms.com is the realization of our vision, and I’m excited as hell at the possibilities opened and the technical roadblocks removed by the release of this tool.  We’ve even charmed it, so you can  ’juju deploy juju-gui’ into any supported cloud, bare metal (MAAS), or local workstation (via LXC) environment.  Below is a video of deploying OpenStack via our new GUI, and a perfect example of the possibilities that are opened up now that we’ve released this innovative and f*cking awesome tool:

The best part here, is that you can play with the new GUI RIGHT NOW by selecting the “Build” option on jujucharms.com….so go ahead and give it a try!

Join the Championship…Play with the GUI…then Buy the Phone

Cause I will definitely admit…it’s a damn sexy piece of hardware. ;)


Read more
ThomasVo5

Some time ago, Canonical started internal discussions about our convergence strategy, clearly spelling out the distant target of shaping and developing a single computing platform and operating system that is able to power the cloud, classic desktop machines, laptops, TV sets, phones and tablets (fridges anyone?). More to this, we stated that we want a single mobile device, a bundle of portable computing power together with the respective operating system, that seamlessly adapts to different form-factors and use-cases. Or to put it a little more catchy:

Your phone is your TV, is your desktop, is your … you name it.

And yes, we were aiming for the moon. We worked hard to draft a system that would allow us to reach the moon, while at the same time catering towards our other central goal of providing a beautiful and lean user experience. It became apparent in conversations and discussions that one of the cornerstones of the future system we were designing will be the graphics stack, including the Unity shell. After much discussions about existing solutions (X & Weston), and how we could leverage them, we took a step back and distilled down our (technical) requirements for a future graphics stack:

  • Tailored towards an EGL/GL(ES) world.
  • Minimal assumptions regarding the underlying driver model.
  • Ability to leverage existing drivers implementing the Android driver model.
  • Ability to leverage existing hardware compositors.
  • Efficient, in terms of memory-, CPU- and GPU-consumption/usage.
  • Tightly integrated with the Unity shell, fulfilling the shell’s requirements while at the same time not dictating any sort of semantics up the stack.
  • An efficient and secure input subsystem supporting demanding mobile use-cases.
  • Fully tested from the ground up.
  • Adaptable to future requirements.

With these priorities in mind, we revisited and carefully evaluated existing solutions again and found that neither of them satisfies our requirements. In particular, X and its driver model is way too complicated and feature-laden, resulting in a less efficient system and a driver model that is unlikely to be widely supported on mobile platforms. In the case of Weston, the lack of a clearly defined driver model as well as the lack of a rigorous development process in terms of testing driven by a set of well-defined requirements gave us doubts whether it would help us to reach the “moon”. We looked further and found Google’s SurfaceFlinger, a standalone compositor that fulfilled some but not all of our requirements. It benefits from its consistent driver model that is widely adopted and supported within the industry and it fulfills a clear set of requirements. It’s rock-solid and stable, but we did not think that it would empower us to fulfill our mission of a tightly integrated user experience that scales across form-factors. However, SurfaceFlinger was chosen as our initial solution for getting started with the overall Ubuntu Touch project, planning to replace SurfaceFlinger with Mir as soon as possible.

In summary, our evaluation of existing solutions led us to the conclusion that neither of them fits our requirements and that adjusting/adapting them would require substantial efforts, too. For this, we decided to go for our own solution, catering directly towards our goals and our vision, effectively saying: Yes, we are going to do our own display server.

The project was born and we decided to name it Mir (as in the space station), as it would be our outpost that finally enables us to reach our goal of arriving at the moon, providing a seamless and beautiful user experience spanning multiple form-factors. The team set off to work on Mir while large parts of Canonical started working on the Ubuntu Touch project (relying on SurfaceFlinger). A lot of knowledge and experience was and is transferred from the Ubuntu Touch project to the Mir team, with the work on the phone and tablet revealing more and more technical requirements and subtleties that heavily influenced the Mir architecture and implementation. Thus, one of the earliest technical decisions that has been taken by the team concerned the protocol that applications rely upon to communicate to Mir. We evaluated Wayland and compared it to the SurfaceFlinger approach, realizing that neither of them fits our needs. Wayland is a very promising and sensible attempt at standardizing the way that applications talk to a display server, however, it exposes privileged sections like the shell integration that we planned to handle differently, both for security reasons and as we wanted to decouple the way the shell works on top of the display server from the application-facing protocol. In the end, we decided to implement Mir’s core in a protocol-agnostic way, considering the communication protocol a very important part of the overall story, but not the driving one. We have chosen Google’s protocol buffers as our data and interface description language (DDL & IDL) and implemented a lean RPC layer that abstract away transport-specific details (relying on an ordinary socket by default). It is well tested and the protobuf IDL and DDL provide us with forward- as well as backward-versioning capabilities. However, the communication component of the overall system is adaptable and can be adjusted to account for future requirements and developments.

A final word on timelines: At the time of this writing, we are preparing to deploy the next version of the Unity shell tightly integrated with Mir in the not-too-distant future on the phone and the tablet. However, regarding the desktop use-case: We have integrated X, leveraging the prior XWayland work, to run on top of Mir (XMir) and started initial porting efforts for toolkits with a focus on Qt5. With our X-integration in place, you can run Mir on your desktop machine if your system runs a GPU that supports the free driver stack. For the closed-source desktop drivers: We are in active conversations with GPU vendors to enable Mir on those drivers/GPUs, too. [Updated] More to this, we are working together with NVIDIA towards a more unified driver model sitting on top of EGL.

As we are moving along with the development of Mir, we will work on integrating existing toolkits with the new display stack to allow application developers a seamless transition between different form-factors and prevent them from the burden of supporting multiple different display servers. As noted before, Qt5 is our first target here, with GTK3 and XUL being in the pipeline. To support legacy X applications that cannot be adjusted to work against Mir, we will provide a translation-layer that leverages the existing XMir implementation and allows legacy X apps to use Mir transparently.

And that’s pretty much it. Thanks for reading. I hope I could give you some insight into the history of Mir, its motivation and its trajectory. If you want to dive even further into the Mir world, head over to wiki.ubuntu.com/MirSpec. A lot of work has already been done, but more of it lies ahead of us. So if you are curious, have questions, want to help us or simply want to get to know the team, join the Mir sessions at the upcoming virtual UDS. You might also want to read Olli’s description of the bigger picture, including our overall convergence strategy for Unity.

To the moon,

Thomas


Filed under: Technology, Ubuntu, Uncategorized

Read more
sfmadmax

So I use Xchat daily and connect to a private IRC server to talk with my colleagues. I also have a BIP server in the office to record all of the IRC transcripts, this way I never miss any conversations regardless of the time of day. Because the BIP server is behind a firewall on the companies network I can’t access it from the outside.  For the past year I’ve been working around this by connecting to my companies firewall via ssh and creating a SOCKS tunnel then simply directing xchat to talk through my local SOCKS proxy.

To do this ,  open a terminal and issue:

ssh -CND <LOCAL_IP_ADDRESS>:<PORT> <USER>@<SSH HOST>

Ex: ssh -CND 192.168.1.44:9999 sfeole@companyfirewall.com

Starting ssh with -CND:

‘D’ Specifies a local “dynamic” application-level port forwarding. This works by allocating a socket to listen to port on the local side, optionally bound to the specified bind_address. It also adds compression to the datastream ‘C’ and the ‘N’ is a safeguard which protects the user from executing remote commands.

192.168.1.44 is my  IPv4 address

9999 is the local port i’m going to open and direct traffic through

After the SSH tunnel is open I now need to launch xchat, navigate to Settings -> Preferences -> Network Setup and configure xchat to use my local IP (192.168.1.44) and local port (9999) then press OK then Reconnect.

I should now be able to connect to the IRC server behind the firewall. Usually I run through this process a few times a day, so it becomes somewhat of a tedious annoyance after a while.

Recently I finished a cool python3 script that does all of this in quick command.

The following code will do the following:

1.) identify the ipv4 address of the interface device you specify

2.) configure xchat.conf to use the new ipv4 address and port specified by the user

3.) open the ssh tunnel using the SSH -CND command from above

4.) launch xchat and connect to your server (assuming you have it set to auto connect)

To use it simply run

$./xchat.py -i <interface> -p <port>

ex: $./xchat.py -i wlan0 -p 9999

the user can select wlan0 or eth0 and of course their desired port. When your done with the tunnel simply issue <Ctrl-C> to kill it and wala!

https://code.launchpad.net/~sfeole/+junk/xchat

#!/usr/bin/env python3
#Sean Feole 2012,
#
#xchat proxy wrapper, for those of you that are constantly on the go:
#   --------------  What does it do? ------------------
# Creates a SSH Tunnel to Proxy through and updates your xchat config
# so that the user does not need to muddle with program settings

import signal
import shutil
import sys
import subprocess
import argparse
import re
import time

proxyhost = "myhost.company.com"
proxyuser = "sfeole"
localusername = "sfeole"

def get_net_info(interface):
    """
    Obtains your IPv4 address
    """

    myaddress = subprocess.getoutput("/sbin/ifconfig %s" % interface)\
                .split("\n")[1].split()[1][5:]
    if myaddress == "CAST":
        print ("Please Confirm that your Network Device is Configured")
        sys.exit()
    else:
        return (myaddress)

def configure_xchat_config(Proxy_ipaddress, Proxy_port):
    """
    Reads your current xchat.conf and creates a new one in /tmp
    """

    in_file = open("/home/%s/.xchat2/xchat.conf" % localusername, "r")
    output_file = open("/tmp/xchat.conf", "w")
    for line in in_file.readlines():
        line = re.sub(r'net_proxy_host.+', 'net_proxy_host = %s'
                 % Proxy_ipaddress, line)
        line = re.sub(r'net_proxy_port.+', 'net_proxy_port = %s'
                 % Proxy_port, line)
        output_file.write(line)
    output_file.close()
    in_file.close()
    shutil.copy("/tmp/xchat.conf", "/home/%s/.xchat2/xchat.conf"
                 % localusername)

def ssh_proxy(ProxyAddress, ProxyPort, ProxyUser, ProxyHost):
    """
    Create SSH Tunnel and Launch Xchat
    """

    ssh_address = "%s:%i" % (ProxyAddress, ProxyPort)
    user_string = "%s@%s" % (ProxyUser, ProxyHost)
    ssh_open = subprocess.Popen(["/usr/bin/ssh", "-CND", ssh_address,
                 user_string], stdout=subprocess.PIPE, stdin=subprocess.PIPE)

    time.sleep(1)
    print ("")
    print ("Kill this tunnel with Ctrl-C")
    time.sleep(2)
    subprocess.call("xchat")
    stat = ssh_open.poll()
    while stat is None:
        stat = ssh_open.poll()

def main():
    """
    Core Code
    """

    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--interface',
                        help="Select the interface you wish to use",
                        choices=['eth0', 'wlan0'],
                        required=True)
    parser.add_argument('-p', '--port',
                        help="Select the internal port you wish to bind to",
                        required=True, type=int)
    args = parser.parse_args()

    proxyip = (get_net_info("%s" % args.interface))
    configure_xchat_config(proxyip, args.port)
    print (proxyip, args.port, proxyuser, proxyhost)

    ssh_proxy(proxyip, args.port, proxyuser, proxyhost)

if __name__ == "__main__":
    sys.exit(main())

Refer to the launchpad address above for more info.


Read more
sfmadmax

So looking for new ways to extend your laptop battery life??  Just recently I found a great combo that involves using a very cool application called “Jupiter

You can grab Jupiter from the launchpad PPA @ https://launchpad.net/~webupd8team/+archive/jupiter

I have yet to find a good application that handles “On Demand” mode relatively well. This app clocks down your processors when on battery to their lowest setting and kicks them back up once A/C power is restored. I have used other linux power mgmt tools but haven’t had a great experience. I have a system76 Pangolin and it’s pretty power hungry, it’s pretty much a mobile desktop and during the Natty / Oneiric releases of Ubuntu I was lucky to get 40 minutes on the beast. But that was because everything was running full power, After installing Jupiter and making some additional changes I managed to turn 40 minutes into 2 hours. Not bad eh?

Some of the additional changes I made involved the following:

Taking /var/log and completely mounting it to tmpfs. This way we are writing straight to memory, not needing to bother the disk constant reads/writes. Take note that this causes your logs to clear out at the end of every reboot/shutdown, but I’ve seen improvement.

So first we need to make some modifications in /etc/fstab

tmpfs /tmp tmpfs defaults,noatime,mode=1777 0 0
tmpfs /var/log tmpfs defaults,noatime,mode=1777 0 0
tmpfs /var/tmp tmpfs defaults,noatime,mode=1777 0 0

Save that off

Then lets carry out the following.

$ sudo service rsyslog status // to check if it’s up and running
$ sudo service rsyslog stop
$ sudo rm -rf /tmp/*
$ sudo rm -rf /var/log/*
$ sudo rm -rf /var/tmp/*
$ sudo mount -a
$ sudo service rsyslog start

Now you will notice all system logs will be directed to /tmp. Give it a try for a week or two and see if you notice any difference in your battery life.


Read more
ThomasVo5

This post explains how to conduct large-scale MOO experiments with the SHARK machine learning library on clusters running Oracle grid engine.

An experiment consists of three phases:

  1. front approximation
  2. performance indicator calculation
  3. result accumulation and statistics calculation

Within this post, I’m going to focus on the first step.

Front Approximation

In this phases, the Pareto front approximations generated by applying multiple multi-objective evolutionary algorithms (MOEAs) to a set of objective functions are recorded.

Here, I assume that we want to evaluate the (µ+1)-MO-CMA-ES relying on the hypervolume indicator on the DTLZ suite of benchmark functions. A ready-to-use command-line application implementing the MO-CMA-ES is bundled with the default Shark installation. The executable is configurable via command-line arguments queryable by passing –help:

  --objectiveFunction arg
  --seed arg (=1)
  --storageInterval arg (=100)
  --searchSpaceDimension arg (=10)
  --maxNoEvaluations arg (=50000)
  --timeLimit arg (=1000)
  --fitnessLimit arg (=1e-10)
  --resultDir arg (=.)
  --algorithmConfigFile arg
  --algorithmUsage 
  --defaultAlgorithmUsage 
  --objectiveSpaceDimension arg (=2)
  --reportFitnessFunctions 

That is, to execute the MO-CMA-ES for DTLZ2 with 3 objectives and terminating after 50000 objective function evaluations, the following call is required:

  SteadyStateMOCMAMain --objectiveFunction=DTLZ2 --objectiveSpaceDimension=3 --maxNoEvaluations=50000 

Note that we do not specify the rng seed explicitly but rely on the default value 1.

For the scenario considered here, we want to run several independent trials of one specific MOEA and one specific objective function in parallel. To this end, we rely on the array job feature of the grid engine and submit an array of 25 independent trials to the grid engine with the following command:

  qsub -N 'DTLZ2_3' -t 1-25 RunAlgo.sh DTLZ2 /globally/known/path 3

Here, the script RunAlgo.sh is defined as follows:

#!/bin/bash
#$ -S /bin/bash
#$ -o /dev/null

SteadyStateMOCMAMain --seed $SGE_TASK_ID --resultDir=$2 --objectiveFunction=$1 --objectiveSpaceDimension=$3

In summary, the script takes care of actually running the algorithm and setting the seed to environment variable $SGE_TASK_ID. The variable is set by the grid engine to the unique job number and thus, we can ensure independent trials. There is one more thing to note: The result dir needs to be known across the whole cluster. Normally, your dev ops provide you with a scratch environment that is accessible from every computing node.

That’s it. Wait a few minutes until the experiment completes and stay tuned for the second post that explains how to evaluate the quality of the Pareto-front approximations.


Filed under: C++, Shark, Technology, Uncategorized

Read more
ThomasVo5

(µ+1)-MO-CMA-ES

A brief video of the (µ+1)-MO-CMA-ES solving DTLZ2 with 3 objectives.


Filed under: Shark, Technology

Read more