Canonical Voices

Posts tagged with 'devops'

Mark Baker

The telco business has long prided itself on providing dependable services all day every day. Today, dial tones generally survive earthquakes, hurricanes, wars and power cuts and that is testimony to the service quality telcos provide. This high level of service quality runs through a telco’s DNA, which gives their applications the renowned ‘telco-grade’ high quality, highly scalable and constant availability. But creating such a culture comes at a cost.

 

The standards are a result of the tightly controlled software used by telcos which have been tested over many years. Strict processes are employed to minimise the chance of failure of any item in the service, and robust backup or failover services are provided in the advent of failure. While this is essential to deliver failsafe services, it also creates a restrictive environment in which launching new services based on new technologies is severely hampered.

 

As a result, new technology businesses are out-maneuvering telcos by being able to offer services based on the latest development frameworks. These are put together using agile processes and pushed into production by super smart DevOps who have planned application architectures assuming failures will happen. Whether it is Infrastructure As A Service (IAAS) platforms, a move towards IP based voice and data services, or mobile application delivery services that drive customer engagement and retention, startups and tech companies are all delivering strong solutions into the market and putting pressure on telcos to do the same.

The Telco Application Developer Summit in Bangkok, November 21st and 22nd, aims to try and accelerate the pace of new service delivery for telcos by enabling developers to discuss the benefits of DevOp and agile practises. With Ubuntu being at the centre of many of the recent innovations in the high tech space, be it OpenStack cloud, Platform As A Service (PAAS), Software Defined Networking (SDN) or public cloud computing, we are very excited to be a part of this conference. We will be in attendance and demonstrating technologies such as Juju, which enables services to be launched and scaled instantly. If you are involved in the delivery of application services for telcos you should check TADS out and maybe we will see you there.

Read more
Sidnei

As part of my job as Operations Engineer on the Ubuntu One team, I’m constantly looking for ways to improve the reliability and operational efficiency of the service as a whole.

Very high on my list of things to fix I have an item to look into Vaurien, and make some tweaks to the service to cope better with outages in other parts of the system.

As you probably realized by now if you’re somehow involved into the design and maintenance of any kind of distributed systems, network partitions are a big deal, and we’ve had some of those affect our service in the past, some with very interesting effects.

Take our application servers for example. They’ve been through many generations of rewrites, and switches from one WSGI server to another in the past (not long before I joined the team), each of them with a particular issue. Either they didn’t scale, or crashed constantly, or had memory leaks. Or maybe none of those (it was before I joined the team, so I wouldn’t know for sure). By the time I joined, Paste, of all the things, was one of the WSGI servers in use in part of the service, and Twisted WSGI was used in another part.

(The actual setup of those services is very interesting on itself. It’s a mix of Twisted and Django (and many others have done this before, so it’s not very unique. But there are internal details which are quite interesting. More below.)

Having moved from another team that used Twisted heavily, I decided to call it out and settle on Twisted WSGI, which seemed just fine.

As for the stability and memory issues, we started ironing them out one by one. Turns out the majority of the problems had nothing to do with the WSGI server itself, but everything to do with not cleaning up resources correctly, be it temporary files, file descriptors, and cycles between producers and consumers.

And everything was perfect.

But then we got a few networking issues and hardware issues. Some of the servers were eventually moved to a different datacenter and things got even more interesting. I’ll go into the details of the specific problems that I’m hoping to approach with Vaurien on a different post, but suffice to say that talking to many external services in a threaded server doesn’t get pretty when there’s a network blip.

So speaking of threaded and Twisted, and coming to the subject of this post.

In front of a subset of our services we currently have 4 HAProxy instances in different servers. They are all set up to use httpchk every 2 seconds, which by default sends an OPTIONS request to ‘/’. If you’re still following, we have a Django app running there, and depending on how you have your Django app configured, it might just take that OPTIONS request and treat it just like a GET, effectively (in our case) rendering a response just as if a normal browser had requested it. Turns out that page is not particularly lean in our case.

So you take 4 servers, effectively doing a GET request to your homepage every 2s each one, times many processes serving that page across a couple hosts, and you have a full plate for someone looking for things to optimize.

To make it more fun, early on I added monitoring of the thread pool used by Twisted WSGI, sending metrics to Graphite. Whenever we had a network blip we saw the queue growing and growing without bound. This was actually a combination of a couple things, which I’m still working on fixing:

  1. HAProxy will keep triggering the httpchk after the service is taken out of rotation.
  2. Twisted WSGI will keep accepting requests and throwing them in the thread pool queue, even if the thread pool is busy and the queue is building up.
  3. We do a terrible job at timing out connections to external services currently so a minor blip can easily cause the thread pool queue to build up.

As a strategy to alleviate that problem I came up with the following solution:

  1. Implement a custom thread pool that triggers a callback when going from busy -> free and from free -> busy (where busy is defined as: there are more requests queued than idle threads).
  2. Changed the response to the HAProxy httpchk to simply check that busy/free state.
  3. Changed the handling of that HAProxy check to *not* go through the thread pool.

(There’s a few more details that I won’t get into in this post, but that’s the high-level summary.)

I have good confidence that this will fix (or at least alleviate) the main issue, which is the queue growing without bounds in the thread pool, and it will instead move the queueing to HAProxy. But after looking through the metrics today I saw an unintended consequence of the changes.

busy-threads

load-avg

In retrospect, it seems fairly obvious that this was to be one of the expected outcomes. I was simply surprised to see it since it was not the immediate goal of the proposed changes, but simply a side effect of them.

I hope you enjoyed this glimpse into what goes on at the heart of my job. I expect to write more about this soon, and maybe explore some of the details that I didn’t get into, since this post is already too long.

Read more
Mark Baker

As clouds for IT infrastructure become commonplace, admins and devops need quick, easy ways of deploying and orchestrating cloud services.  As we mentioned in October, Ubuntu now has a GUI for Juju, the service orchestration tool for server and cloud. In this post we wanted to expand a bit more on how Juju makes it even easier to visualise and keep track of complex cloud environments.

Juju provides the ability to rapidly deploy cloud services on OpenStack, HP Cloud, AWS and other platforms using a library of 100 ‘charms’ which cover applications from node.js to Hadoop. Juju GUI makes the Juju command line interface even easier, giving the ability to deploy, manage and track progress visually as your cloud grows (or shrinks).

Juju GUI is easy and totally intuitive.  To start, you simply search for the service you want on the Juju GUI charm search bar (top right on the screen).  In this case I want to deploy WordPress to host my blog site.  I have the chance to alter the WordPress settings, and with a few clicks the service is ready.  Its displayed as an icon on the GUI.

I then want a mysql service to go alongside.  Again I search for the charm, set the parameter (or accept the defaults) and away we go.

Its even easier to build the relations between these services by point and click. Juju knows that the relationship needs a suitable database link.

I can expose WordPress to users by setting expose flag  - at the bottom of a settings screen – to on. To scale up WordPress I can add more units, creating identical copies of the WordPress deployment, including any relationships.  I have selected ten in total, and this shows in the center of the wordpress icon.

And thats it.

For a simple cloud, Juju or other tools might be sufficient.  But as your cloud grows, Juju GUI will be a wonderful way not only to provision and orchestrate services, but more importantly to validate and check that you have the correct links and relationships.  Its an ideal way to replicate and scale cloud services as you need.

For more details of Juju, go to juju.ubuntu.com.  To try Juju GUI for yourself, go to http://uistage.jujucharms.com:8080/

Read more
Mark Baker

Hardened sysadmins and operators often spurn graphical user interfaces (GUIs) as being slow, cumbersome, unscriptable and inflexible. GUIs are for wimps, right?

Well, I’m not going to argue – and certainly, command line interfaces (CLIs) have their benefits, for those comfortable using them. But we are seeing a pronounced change in the industry, as developers start to take a much greater interest in the deployment and operation of flexible, elastic services in scale out or cloud environments. Whilst many of these new ‘devops’ are happy with a CLI, others want to be able to visualise their environment. In the same way that IDEs are popular, being able to see a representation of the services that are running and how they are related can prove extremely valuable. The same goes for launching new services or removing existing ones.

This is why, last week, as part of the new Ubuntu 12.10 release, we announced a GUI for Juju, the Ubuntu service orchestration tool for server and cloud.
The new Juju GUI does all these things and more. For those of you unfamiliar with it, Juju uses a service definition file know as a ‘charm’. Much of the magic in Juju comes from the collective expertise that has gone into developing this the charm. It enables you to deploy complex services without intimate knowledge of the best practice associated that service. Instead, all that deployment expertise is encapsulated in the charm.
Now, with the Juju GUI, it gets even easier. You can select services from a library of nearly 100 charms, covering applications from node.js to Hadoop. And you can deploy them live on any of the providers that Juju supports – OpenStack, HP Cloud, Amazon Web Services and Ubuntu’s Metal-as-a-Service. You can add relations between services while they are running, explore the load on them, upgrade them or destroy them. At the OpenStack Summit in San Diego this year, Mark Shuttleworth even used it to upgrade a running* OpenStack Cloud from Essex to Folsom.
Since the Juju GUI was first shown, the interest and feedback has been tremendous. It certainly seems to make the magic of Juju – and what it can do for people – easier to see. If you haven’t seen it already, check out the screen shots below or visit http://uistage.jujucharms.com:8080/

Because as we’ve always known, a picture really is worth a 1000 words.

 

Juju Gui Image

The Juju GUI

 

 

*Running on Ubuntu Server, obviously.

Read more
Cezzaine Haigh

The cloud is disrupting the enterprise computing world, driven by the growth of open-source software. As a result, new opportunities are emerging; it’s time to exploit them. 

On the 30th October, Canonical will host an Ubuntu Enterprise Summit in Copenhagen. Industry analysts and enterprise users of Ubuntu and open source technologies, will join key figures from Canonical to discuss the opportunities these converging trends present.

The event is designed around three key topics

- How flexibility creates business value
- Choosing which bandwagon to board
- The way ahead, from client to cloud

With a keynotes from Ubuntu founder Mark Shuttleworth and two streams of content – one aimed at business decision-makers and the other at enterprise technologists – it offers an essential briefing on delivering effective IT in a cloud-obsessed world.

Learn more and register your place.

Read more