Canonical Voices

What the blog of robin talks about

(Also posted on the Canonical blog)

On 10th September 2014, Canonical are joining in with Internet Slowdown day to support the fight for net neutrality.

Along with Reddit, Tumblr, Boing Boing, Kickstarter and many more sites, we will be sporting banners on our main sites, www.ubuntu.com and www.canonical.com.

Net neutrality

From Wikipedia:

Net neutrality is the principle that Internet service providers and governments should treat all data on the Internet equally, not discriminating or charging differentially by user, content, site, platform, application, type of attached equipment, and modes of communication.

Internet Slowdown day

#InternetSlowdown day is in protest to the FCC’s plans to allow ISPs in America to offer “paid prioritization” of their traffic to certain companies.

If large companies were allowed to pay ISPs to prioritise their traffic, it would be much harder for competing companies to enter the market, effectively giving large corporations a greater monopoly.

I believe that internet service providers should conform to common carrier laws where the carrier is required to provide service to the general public without discrimination.

If you too support net neutrality, please consider signing the Battle for the net petition.

Read more

(This article is was originally posted on design.canonical.com)

On release day we can get up to 8,000 requests a second to ubuntu.com from people trying to download the new release. In fact, last October (13.10) was the first release day in a long time that the site didn't crash under the load at some point during the day (huge credit to the infrastructure team).

Ubuntu.com has been running on Drupal, but we've been gradually migrating it to a more bespoke Django based system. In March we started work on migrating the download section in time for the release of Trusty Tahr. This was a prime opportunity to look for ways to reduce some of the load on the servers.

Choosing geolocated download mirrors is hard work for an application

When someone downloads Ubuntu from ubuntu.com (on a thank-you page), they are actually sent to one of the 300 or so mirror sites that's nearby.

To pick a mirror for the user, the application has to:

  1. Decide from the client's IP address what country they're in
  2. Get the list of mirrors and find the ones that are in their country
  3. Randomly pick them a mirror, while sending more people to mirrors with higher bandwidth

This process is by far the most intensive operation on the whole site, not because these tasks are particularly complicated in themselves, but because this needs to be done for each and every user - potentially 8,000 a second while every other page on the site can be aggressively cached to prevent most requests from hitting the application itself.

For the site to be able to handle this load, we'd need to load-balance requests across perhaps 40 VMs.

Can everything be done client-side?

Our first thought was to embed the entire mirror list in the thank-you page and use JavaScript in the users' browsers to select an appropriate mirror. This would drastically reduce the load on the application, because the download page would then be effectively static and cache-able like every other page.

The only way to reliably get the user's location client-side is with the geolocation API, which is only supported by 85% of users' browsers. Another slight issue is that the user has to give permission before they could be assigned a mirror, which would slightly hinder their experience.

This solution would inconvenience users just a bit too much. So we found a trade-off:

A mixed solution - Apache geolocation

mod_geoip2 for Apache can apply server rules based on a user's location and is much faster than doing geolocation at the application level. This means that we can use Apache to send users to a country-specific version of the download page (e.g. the German desktop thank-you page) by adding &country=GB to the end of the URL.

These country specific pages contain the list of mirrors for that country, and each one can now be cached, vastly reducing the load on the server. Client-side JavaScript randomly selects a mirror for the user, weighted by the bandwidth of each mirror, and kicks off their download, without the need for client-side geolocation support.

This solution was successfully implemented shortly before the release of Trusty Tahr.

Read more

Docker is a fantastic tool for running virtual images and managing light Linux containers extremely quickly.

One thing this has been very useful for in my job at Canonical is quickly running older versions of Ubuntu - for example to test how to install specific packages on Precise when I'm running Trusty.

Installing Docker

The simplest way to install Docker on Ubuntu is using the automatic script:

curl -sSL https://get.docker.io/ubuntu/ | sudo sh

You may then want to authorise your user to run Docker directly (as opposed to using sudo) by adding yourself to the docker group:

sudo gpasswd -a [YOUR-USERNAME] docker

You need to log out and back in again before this will take effect.

Spinning up an old version of Ubuntu

With docker installed, you should be able to run it as follows. The below example is for Ubuntu Precise, but you can replace "precise" with any available ubuntu version:

mkdir share  # Shared folder with docker image - optional
docker run -v `pwd`/share:/share -i -t ubuntu:precise /bin/bash  # Run ubuntu, with a shared folder
root@cba49fae35ce:/#  # We're in!

The -v `pwd`/share:/share part mounts the local ./share/ folder at /share/ within the Docker instance, for easily sharing files with the host OS. Setting this up is optional, but might well be useful.

There are some import things to note:

  • This is a very stripped-down operating system. You are logged in as the root user, your home directory is the filesystem root (/), and very few packages are installed. Almost always, the first thing you'll want to run is apt-get update. You'll then almost certainly need to install a few packages before this instance will be of any use.
  • Every time you run the above command it will spin up a new instance of the Ubuntu image from scratch. If you log out, retrieving your current instance in that same state is complicated. So don't logout until you're done. Or learn about managing Docker containers.
  • In some cases, Docker will be unable to resolve DNS correctly, meaning that apt-get update will fail. In this case, follow the guide to fix DNS.

Read more

Fix Docker's DNS

Docker is really useful for a great many things - including, but not limited to, quickly testing older versions of Ubuntu. If you've not used it before, why not try out the online demo?.

Networking issues

Sometimes docker is unable to use the host OS's DNS resolver, resulting in a DNS resolve error within your Docker container:

$ sudo docker run -i -t ubuntu /bin/bash  # Start a docker container
root@0cca56c41dfe:/# apt-get update  # Try to Update apt from within the container
Err http://archive.ubuntu.com precise Release.gpg
Temporary failure resolving 'archive.ubuntu.com'  # DNS resolve failure
..
W: Some index files failed to download. They have been ignored, or old ones used instead.

How to fix it

We can fix this by explicitly telling Docker to use Google's DNS public server (8.8.8.8).

However, within some networks (for example, Canonical's London office) all public DNS will be blocked, so we should find and explicitly add the network's DNS server as a backup as well:

Get the address of your current DNS server

From the host OS, check the address of the DNS server you're using locally with nm-tool, e.g.:

$ nm-tool
...
  IPv4 Settings:
    Address:         192.168.100.154
    Prefix:          21 (255.255.248.0)
    Gateway:         192.168.100.101

    DNS:             192.168.100.101  # This is my DNS server address
...

Add your DNS server as a 2nd DNS server for Docker

Now open up the docker config file at /etc/default/docker, and update or replace the DOCKER_OPTS setting to add Google's DNS server first, but yours as a backup: --dns 8.8.8.8 --dns=[YOUR-DNS-SERVER]. E.g.:

# /etc/default/docker
# ...
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--dns 8.8.8.8 --dns 192.168.100.102"
# Google's DNS first ^, and ours ^ second

Restart Docker

sudo service docker restart

Success?

Hopefully, all should now be well:

$ sudo docker run -i -t ubuntu /bin/bash  # Start a docker container
root@0cca56c41dfe:/# apt-get update  # Try to Update apt from within the container
Get:1 http://archive.ubuntu.com precise Release.gpg [198 B]  # DNS resolves properly
...

Read more

If you glance up to the address bar, you will see that this post is being served securely. I've done this because I believe strongly in the importance of internet privacy, and I support the Reset The Net campaign to encrypt the web.

I've done this completely for free. Here's how:

Get a free certificate

StartSSL isn't the nicest website in the world to use. However, they will give you a free certificate without too much hassle. Click "Sign up" and follow the instructions.

Get an OpenShift Bronze account

Sign up to a RedHat OpenShift Bronze account. Although this account is free to use, as long as you only use one 1-3 gears, it does require you to provide card details.

Once you have an account, create a new application. On the application screen, open the list of domain aliases by clicking on the aliases link (might say "change"):

Application page - click on aliases

Edit your selected domain name and upload the certificate, chain file and private key. NB: Make sure you upload the chain file. If the chain file isn't uploaded initially it may not register later on.

Pushing your site

Now you can push any website to the created application and it should be securely hosted.

Given that you only get 1-3 gears for free, if you have a static site, it's more likely to handle high load. For instance, this site gets about 250 visitors a day and runs perfectly fine on the free resources from OpenShift.

Read more

Following are some guidelines about Agile philosophy that I wrote for my team back in September 2012. If you're thinking about implementing Agile, you might also find this StackExchange answer helpful.


Agile software development is a philosophy for managing software projects and teams. It has similarities to lean manufacturing principles for "eliminating waste".

The philosophy centers around the agile manifesto:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

  • Individuals and interactions over Processes and tools
  • Working software over Comprehensive documentation
  • Customer collaboration over Contract negotiation
  • Responding to change over Following a plan

That is, while there is value in the items on the right, we value the items on the left more.

Of the various software development methodologies out there, Scrum and Extreme programming particularly try to follow agile software development principles.

Lean philosophy

Lean software development is rapidly gaining support within the agile community.

The 7 principles of lean software development are:

  1. Eliminate waste
  2. Amplify learning
  3. Decide as late as possible
  4. Deliver as fast as possible
  5. Empower the team
  6. Build integrity in
  7. See the whole

Agile practices and principles

Without choosing to follow any one defined methodology for project management, here are some common practices that could be adopted by an agile team:

  • Maintain a prioritised backlog of work with a single backlog manager
  • Have a team whiteboard with sticky notes for keeping track of your tasks and blockers - it really helps to have visibility of the work taking place
  • Break down work into manageable chunks - each less than a day's work
  • Try to use relative sizing to size up work, rather than actual concrete amounts of time. Here's why.
  • Have a daily scrum including as many stake-holders as possible so everyone knows what's going on and how things are progressing
  • Try to produce the minimum viable product (this principle is linked to release early release often and iterative development)
  • Fixed time-scales, variable requirements - have fixed production deadlines and structure work such that chunks can be dropped from the iteration if they're not ready
  • Measure the team's velocity and use it to estimate work
  • Use pair-programming for knowledge-sharing and for working through blockers

Read more

I wrote this set of programming principles for my team to follow back in 2012. I'm sure there are many like it, but this one is mine. May you find it useful.

Writing code

Try to write expressive code.

Beware code bloat - adhere to the YAGNI principle

Practicing Behaviour-Driven Development can help with both of these aims.

Do less: Before writing a new piece of functionality, go and look for similary solutions that already exist and extend them.

Code architecture

Namespace your classes, and code to an interface (this is an implementation of the Design by Contract principle), and make your interfaces (both programming interfaces and user-interfaces) as simple as possible.

Try to learn and comply with all 5 principles of SOLID (watch this great video).

Learn as many Design Patterns as you can to inform your coding, but beware of implementing them blindly. Developers can be over-zealous in their use of Design Patterns and may end up over-engineering a solution.

Some useful design patterns:

Tools

Try to learn an IDE with advanced features. These can really save you a lot of time:

  • Syntax highlighting
  • Auto-complete for function, class and method names
  • Auto-formatting
  • Code navigation help - e.g. jump to class declaration
  • Collapsing of code blocks
  • Overviews of code, e.g. a list of all methods within a class
  • Debugging tools like break points

Some suggestions:

Read more

Luminous beings are we

A diary entry from October 15th 2013

Today I very much wanted to work on my voice. Work out how to get my message across - feel like I was saying something genuine, something of significance.

I like the idea of sketches. Particularly sketches about systems and networks. How everyone is connected, and human society grows like an organism, each little autonomous cell influencing each other one. We are like a neural network.

And I wanted to illustrate how these autonomous nodes make up an ebbing and flowing tide, with each individual or group potentially changing the direction of the tide. We are all connected, we all influence each other, we all have power to change the flow of the tide, but we also are swept along by it. I find this vision inspiring but not intimidating. Any one of us can be the instigator of a change of direction, but we are under no pressure to be.

Hmm. Some academics probably study sentient fluids.... Like traffic. That would be an interesting topic.

People grow and develop in this way too. We rush or stagnate through deliberate or accidental events. We are none of us ultimately in control. I believe this absolves any one person of too much responsibility, but at the same time we are all responsible. I wish I could communicate this idea succinctly. I hope a vision like this can lead to people judging each other less. It's hard to explain how.

I think this is like a hacker's vision. There are endless possibilities for this organism. No-one knows where it will go. There is no defined end-goal. We are constantly discovering. Every individual life is a unique exploration. There can be no higher goal than to explore, finding solutions and perspectives that are unique, continuing the exploration.

This is hacking - life is hacking.

But somehow I feel like I'm letting down this purpose. I am not exploring as much as I could be. I'm somewhat stagnating. I'd like to be inspiring people, and communicating my thoughts and ideas honestly. I certainly feel like I have thoughts and ideas, unique perspectives, and my current job and my current lifestyle are not realising one tenth of them. How to solve this?

That'll do for now. Goodnight, diary.

Read more

Writing expressive code

As any coder gains experience, they inevitebly learn more and more ways to solve the same problem.

The very first consideration is simplicity. We probably want to use as simple and direct a solution as possible - to avoid over-engineering. But the simplest solution is not necessarily the shortest solution.

After simplicity, the very next consideration should be expressiveness. You should always be thinking about how deeply a new developer is going to have to delve into your code to understand what's going on.

Code is poetry

Writing expressive code may help future coders to understand what's going on. It may even help you in the future. But it may also help you simply to understand the problem. Thinking carefully about how to define and encapsulate the components of your solution will often help you to understand the problem better, leading to a more logical solution.

"Self-documenting code"

"Self-documenting code" is about structuring your code and choosing your method and variable names so that your code will be largely self-describing. This is a great practice, and can make some comments redundant:

$user = new User(); // create a new user object
$user->loadFromSession(session); // update the user from the session
if ($user->isAuthenticated()) { ... } // If the user is authenticated...

However, as a recent discussion with a friend of mine highlighted to me, expressive code is not a replacement for comments - no code is entirely "self-documenting". Always write as expresively as you can, but also always document where it makes sense. Methods, functions and classes should always be summarised with a comment - as mentioned in the Python coding conventions.

Wording

It's worth thinking carefully about how you name your variables and methods.

Don't abbreviate

var uid = 10; // I am unlikely to know what uid stands for without context
var userIdentifier = 10; // Better

Be specific

Use as concrete and specific nouns as you can to describe methods and functions:

var event; // bad - generic
var newsLinkClickEvent; // good - specific

Encapsulation

No-one likes to read a really long procedural program. It's very difficult to follow. It's much easier to read a shorter set of well-encapsulated method calls. If you need to delve deeper, simply look in the relevant method:

// Instead of showing you all the details of how we update the user
// We encapsulate that in the updateDetails method
// allowing you to quickly see the top-level processes
function saveUserDetails(userStore, userDetails) {
    var user = new User();
    user.updateDetails(userDetails); // sets a whole bunch of details on the user
    userStore.save(user); // Converts user data into the correct format, and then saves it in the user store
}

Do you need an else?

The use of many if .. else conditionals make programs confusing. In many cases, the else part can be encapsulated in a separate method or function call, making the program easier to read:

// With the else
if (user.permissionGroup == 'administrator') {
    article.delete();
} else {
    page.showError("Sorry you don't have permission to delete this article");
}
// Without the else
if (!user.deleteArticle(article)) {
    page.showError("Sorry you don't have permission to delete this article");
}

In cases where a switch is used, or multiple if .. else if statements, you could consider using different types instead:

class User {
    function deleteArticle($article) {
        $success = false;

        if (
            user->permissionGroup == 'administrator'
            || user->permissionGroup == 'editor'
        ) {
            $success = $article->delete();
        }

        return $success;
    }
}

You can remove the need for this if, by making special types:

trait ArticleDeletion {
    function deleteArticle($article) {
        return $article->delete();
    }
}

class Editor implements User { use ArticleDeletion; }
class Administrator implements User { use ArticleDeletion; }

Notice that I've deliberately opted not to make Administrator inherit from Editor, but instead compose them separately. This keeps my structure more flat and flexible. This is an example of composition over inheritence.

Depth

While encapsulation is often a good thing, to make programs easier to understand at the higher level, it's important to preserve the single responsibility principle by not encapsulating separate concerns together.

For example, one could write:

var user = new User();
user.UpdateFromForm(); // Imports user data from the page form
user.SaveToDatabase();

While this is both short and fairly clear, it suffers from two other problems:

  • The user has to delve further into the code to find basic information, like the name of the Database class, or which form the details are stored in
  • If we want to use a different instance of the Database, we have to edit the User class, which doesn't make a whole lot of sense.

In general you should always pass objects around, rather than instantiating them inside each other:

var user = new User();
var userData = Request.Form;
var database = new DatabaseManager();

user.ImportData(userData);
database.Save(user);

This is more lines, but it is nonetheless clearer what is actually happening, and it's more versatile.

Tidiness

Always try to format your code so that it is easily readable. Don't be afraid of white space, and use indentation sensibly to highlight the structure of your code.

Where there is an accepted code style guide, you should try to follow it. For example, PHP has the FIG standards.

However, I don't think it's worthwhile being overly anal about code standards (my thinking has evolved on this somewhat) because you'll never be able to get everybody to code exactly the same way. So if (like me) you're a coder who feels the need to reformat code whenever you see it to make it fit in with anal standards, you could probably so with training yourself out of that habit. As long as you can read it, leave it be.

Delete commented out code

If you're using a version control system (like Git) there really is no need to keep large blocks of commented-out or unused code. You should just delete it, to keep your codebase tidier. If you really need it again, you can just got and find it in the version control history.

Trade-offs

There will always be a trade-off between expresiveness and succinctness.

Depth vs. encapsulation

It is desirible to keep as flat a structure as possible in your objects, so that programmers don't have to delve through parent class after parent class to find the relevant bit of code. But it is also important to keep code encapsulated in logical units.

Both the goals are often achievable by doing composition over inheritence using dependency injection or traits / multiple inheritence.

Special syntax

In many languages there are often slightly obscure constructs that can nonetheless save time. With many of these there is a readability vs. simplicity trade-off.

Ternary operators and null coalescing

Both C# and PHP have null coalescing operators:

var userType = user.Type ?? defaultType; // C#
$userType = $user->Type ?: $defaultType; // PHP

And almost all languages support the ternary operator:

var userType = user.Type != null ? user.Type : defaultType;

Both of these constructs are much more succinct than a full if .. else construct, but they are less semantically clear, hence the trade-off. Personally, I think it's fine to use the ternary operator in simple conditionals like this, but if it gets any more complicated then you should always use a full if .. else statement.

Plugins / libraries

For example, in C#:

var brownFish;

foreach (var fish in fishes) {
    if (fish.colour == "brown") {
        brownFish = fish;
        break;
    }
}

Can be simplified with the Linq library:

using System.Linq;

var brownFish = fishes.First(fish => fish.colour == "brown");

The latter is clearly simpler, and hopefully not too difficult to understand, but its does require:

  1. Knowledge of the Linq library
  2. An understanding of lambda expressions work

I think that in this case the Linq solution is so much simpler and quite expressive enough that it should definitely be preferred - and hopefully if another developer doesn't know about Linq, it will be quite easy for them to pick up, and will expand their knowledge.

Single-use variables

While the following variable is pointless:

var arrayLength = myArray.length;

for (var arrayIterator; arrayIterator < arrayLength; arrayIterator++) { ... }

There are some cases where variables can be used to add useful semantic meaning:

var slideshowContainer = jQuery('main>.show');

slideshowContainer.startSlideshow();

Read more

In the last couple of months I've had a number of discussions with people who were under the impression that encryption has been cracked by the NSA.

If you like, jump straight to what you can do about it.

The story

The story started in September, in the Guardian:

NSA and GCHQ unlock encryption used to protect emails, banking and medical records

(Guardian - Revealed: how US and UK spy agencies defeat internet privacy and security, James Ball, Julian Borger and Glenn Greenwald, 5th September 2013)

This came up again today, because Sir Tim Berners-Lee made a statement:

In an interview with the Guardian, he expressed particular outrage that GCHQ and the NSA had weakened online security by cracking much of the online encryption on which hundreds of millions of users rely to guard data privacy.

(Guardian - Tim Berners-Lee condemns spy agencies as heads face MPs, Ed Pilkington, 7th November 2013)

And something very similar to this was stated in the Radio 4 news program I was listening to this morning.

The worry

On the face of it this sounds like the NSA's geniuses have reverse-engineered some core cryptographic principles - e.g. worked out how to quickly deduce prime factors from a public key (read an explanation of RSA).

This would be very serious. I was sceptical though, because I believe that if there were key vulnerabilities in public algorithms, the public would have found them long before the NSA. They don't have a monopoly on good mathematicians. This is, after all, why open-source code and public algorithms are inherently more secure.

The truth

Helpfully, Massachusetts Institute of Technology published an article 4 days later clarifying what the NSA had likely achieved:

New details of the NSA’s capabilities suggest encryption can still be trusted. But more effort is needed to fix problems with how it is used.

(NSA Leak Leaves Crypto-Math Intact but Highlights Known Workarounds, Tom Simonite, 9th September 2013)

This shows that (still as far as we know) the NSA have done nothing unprecedented. They have, however, gone to huge lengths to exploit every known vulnerability in security systems, regardless of legality. Mostly, these vulnerabilities are with the end-point systems, not the cryptography itself.

What the NSA and GCHQ have done

I've tried to list these in order of severity:

  • Intercepted huge amounts of encrypted and unencrypted internet traffic
  • Used network taps to get hold of Google and Yahoo's (and probably others') unencrypted private data as it's transferred between their servers
  • Acquired private-keys wherever they can, presumably through traditional hacking methods like brute-forcing passwords, social engineering, or inside contacts.
  • Built back doors into certain commercial encryption software products (most notably, Microsoft)
  • Used brute-force attacks to find weaker (1024-bit) RSA private keys
  • Used court orders to force companies to give up personal information

A word about RSA brute-forcing

We have known for a while that 1024-bit RSA keys could feasibly be brute-forced by anyone with enough resources - and many assumed that the U.S security agencies would almost certainly be doing it. So for the more paranoid among us, this should be no surprise.

“RSA 1024 is entirely too weak to be used anywhere with any confidence in its security” says Tom Ritter

However, MIT also claim that these weaker keys are:

used by most websites that offer secure SSL connections

This surprises me, as I know that GoDaddy at least won't sell you a certificate for a key shorter than 2048-bit - and I would assume other certificate vendors would follow suit. But maybe this is fairly recent.

However, even if "most websites" use RSA-1024, it doesn't mean that the NSA is decrypting all of this encrypted traffic, because it still requires a huge amount of resources (and time) to do, and the sheer number of such keys being used will also be huge. This means the NSA can only be decrypting data from specifically targeted sites. They won't have decrypted all of it.

What you can do

Now that we know this is going on, it only means that we should be more stringent about the security best-practices that already existed:

  • Use only public, open-source, tried and tested programs and algorithms
  • Use 2048-bit or longer RSA keys
  • Configure secure servers to prefer "perfect forward secrecy" cyphers
  • Avoid the mainstream service providers (Google, Yahoo, Microsoft) where you can
  • Secure your end-points: disable your root login; use secure passwords; know who has access to your private keys

Read more

On Saturday night, there was a big fight outside one of our night-clubs here in Nottingham, in which 3 people were stabbed.

BBC publishing stupid opinions

The BBC wrote an article, including a quote from the nightclub owner:

This is not a localised problem, knife crime is becoming a huge national issue Community sentences and conditional discharges do nothing to discourage criminals

and the pull-quote:

Tougher sentences needed

I don't understand why the BBC felt the need to give a platform to this particular schmuck. It is the responsibility of journalists, in my opinion, to stem the tide of sensationalism after events like this - after all, they should understand better than anyone the frequency with which stories like this occur.

The truth about knife crime

According to knife crime statistics from parliament.uk:

The number of knife offences recorded (during the year to June 2012) was 9% lower than in the preceding year.

NHS data suggests there were 4,490 people admitted to English hospitals in 2011/12 due to assault by a sharp object. The lowest level since 2002/03.

Similarly, the Office for National Statistics has stats showing that total knife-related offences in the year to March 2013 is 26,336, down from 31,147 the previous year.

So, Knife crime is not "becoming" any kind of problem. It's an old problem, but it's improving. So shut-up Simon Raine.

Also, I don't believe "tougher custodial sentences" have ever been the best solution. I don't have time to find the evidence now, but I believe custodial sentences only harden criminals, and that rehabilitation is the way forward. And the police and the justice system are slowly realising this - which may be partly helping the knife crime stats. Don't let stupid opinions like these derail that effort.

Read more

If you want to a tool to crawl through your site looking for 404 or 500 errors, there are online tools (e.g. The W3C's online link checker), browser plugins for Firefox and Chrome, or windows programs like Xenu's Link Sleuth.

A unix link checker

Today I found linkchecker - available as a unix command-line program (although it also has a GUI or a web interface).

Install the command-line tool

You can install the command-line tool simply on Ubuntu:

sudo apt-get install linkchecker

Using linkchecker

Like any good command-line program, it has a manual page, but it can be a bit daunting to read, so I give some shortcuts below.

By default, linkchecker will give you a lot of warnings. It'll warn you for any links that result in 301s, as well as all 404s, timeouts, etc., as well as giving you status updates every second or so.

Robots.txt

linkchecker will not crawl a website that is disallowed by a robots.txt file, and there's no way to override that. The solution is to change the robots.txt file to allow linkchecker through:

User-Agent: *
Disallow: /
User-Agent: LinkChecker
Allow: /

Redirecting output

linkchecker seems to be expecting you to redirect its output to a file. If you do so, it will only put the actual warnings and errors in the file, and report status to the command-line:

$ linkchecker http://example.com > siteerrors.log
35 URLs active,     0 URLs queued, 13873 URLs checked, runtime 1 hour, 51 minutes

Timeout

If you're testing a development site, it's quite likely it will be fairly slow to respond and linkchecker may experience many timeouts, so you probably want to up that timeout time:

$ linkchecker --timeout=300 http://example.com > siteerrors.log

Ignore warnings

I don't know about you, but the sites I work on have loads of errors. I want to find 404s and 50*s before I worry about redirect warnings.

$ linkchecker --timeout=300 --no-warnings http://example.com > siteerrors.log

Output type

The default text output is fairly verbose. For easy readability, you probably want the logging to be in CSV format:

$ linkchecker --timeout=300 --no-warnings -ocsv http://example.com > siteerrors.csv

Other options

If you find and fix all your basic 404 and 50* errors, you might then want to turn warnings back on (remove --no-warnings) and start using --check-html and --check-css.

Checking websites with OpenID (2014-04-17 update)

Today I had to use linkchecker to check a site which required authentication with Canonical's OpenID system. To do this, a StackOverflow answer helped me immensely.

I first accessed the site as normal with Chromium, opened the console window and dumped all the cookies that were set in that site:

> document.cookie
"__utmc="111111111"; pysid=1e53e0a04bf8e953c9156ea841e41157;"

I then saved these cookies in cookies.txt in a format that linkchecker will understand:

Host:example.com
Set-cookie: __utmc="111111111"
Set-cookie: pysid="1e53e0a04bf8e953c9156ea841e41157"

And included it in my linkchecker command with --cookiefile:

linkchecker --cookiefile=cookies.txt --timeout=300 --no-warnings -ocsv http://example.com > siteerrors.csv

Use it!

If you work on a website of any significant size, there are almost certainly dozens of broken links and other errors. Link checkers will crawl through the website checking each link for errors.

Link checking your website may seem obvious, but in my experience hardly any dev teams do it regularly.

You might well want to use linkchecker to do automated link checking! I haven't implemented this yet, but I'll try to let you know when I do.

Read more

SeeTheStats is a great free service for exposing your Google Analytics data (the only way to do Analytics) to the public.

Here is some information about my site:

How many people visit my site?

What country are they from?

What pages are they looking at?

What browsers are they using?

What operating systems are they using?

How big are their screens?

My SeeTheStats page

You can also see all these stats over at SeeTheStats.com.

Read more

With the advent of web fonts (e.g. from Google Fonts), thankfully web designers are no longer tied to a limited set of "web safe" fonts.

Fonts and performance

However, there is a potential performance hit with this. You will need to link your CSS files to the font files. The problem here isn't so much the size of the font file (they are typically under 100 KB), it's more that each new HTTP request that a page makes effects performance

Also, when loading web fonts externally you will sometimes see a flicker where the page loads initially with the default browser fonts, and then the new fonts are downloaded and applied afterwards. This flicker can look quite unprofessional.

Font formats and IE8

If you want to support Internet Explorer 8 or older, you unfortunately need to include your fonts in two formats: WOFF and EOT.

However, if you're willing to drop IE8 support (and reap the benefits), or to simply serve the browser default font to IE8, then you can provide your fonts in WOFF only, which is supported by all other relevant browsers.

Data URLs

So Data URLs, if you haven't heard of them, are a way of encoding binary data as a valid URL string. This means the data can be included directly inside HTML or CSS files. They are fantastically easy to create by simply dragging your binary file into the Data URL Creator.

Data URLs are likely to be a bit larger than the binary file would have been. In my experience they tend to be about 20% larger. So the larger the file you're dealing with the less practical it becomes to encode the file as a URL. However, for sub-100k web fonts this difference is not so important.

So using Data URLs, you can include your font directly in your CSS like so:

/* http://www.google.com/webfonts/specimen/Lato */
@font-face {
    font-family: 'Lato light';
    font-style: normal;
    font-weight: 300;
    src: local('Lato Light'), url('data:application/x-font-woff;base64,d09GRg...BQAAAAB'), format('woff');
}

(here's one I prepared earlier)

This will now mean that your web pages will only have to download one CSS file, rather than a CSS file and a bunch of font files, which will help performance. Personally I think it's also neat not to have to create a special directory for font files. Keeping it all in one place (CSS) just seems nice and neat to me.

A word about caching

Whether the above suggestion is actually a good idea will depend on how often your CSS changes. Hopefully you'll be merging your CSS files into one file already to reduce HTTP requests. This of course means that whenever that merged CSS file changes, your users will have to download the whole file again to see your changes.

If your fonts were downloaded as separate files, rather than being included in your CSS, then the fonts may well be cached even if the CSS has changed. However, if you include your fonts inside your CSS files as suggested above, this will mean that whenever your CSS changes a much larger CSS file will have to be downloaded each time. Including your fonts inside your CSS is likely to double the size of your CSS file.

This is a complex decision, but to give you some rough advice I'd say - if you CSS changes more than a couple of times a month then keep your fonts as separate files. If it's less often (as it is with this site) then it's probably worth including them inside the CSS as Data URLs.

If you have a different opinion on this, please let me know in the comments.

Read more

I am always thinking about good general rules for making the world a better place, but it's extremely difficult to succinctly communicate them to anyone.

This is the story of how my friends and I created and agreed on a statement of values.

The foundation

A couple of months ago, I was in an IRC chat room with some friends of mine (do people actually still use IRC? tell me in the comments), and @0atman aired an idea for a charitable project. We all thought it was a good one, and long a discussion ensued about the best way to run the project.

We all felt that it should be run democratically to some extent - that is, largely owned by its members - but we were worried about the project being hijacked and becoming something that none of us wanted it to be.

A potential solution, we felt, was to first create a foundation with exclusive membership and a solid stated set of values. That way, the project could be started by the foundation, but not inherently attached to it, meaning that if the project took a different direction, the foundation would remain intact. This would allow us to either create a fork of the project, bringing it back in line with our values, or start a completely new one, while allowing the existing project to continue in its new direction with our blessing.

Thus was formed the Blackgate Foundation.

(Nothing has come of the project idea yet. I hope it may in the future.)

Arguments over values

Since we formed the foundation specifically to be a solid moral centre for our future projects, the values of the foundation were paramount, so we started debating them in ernest.

Politically and morally we have a lot of things in common, but it was surprising how much we found to argue about. We disagreed about the necessity for punishment, whether there's ever a case to go to war, whether utilitarianism was a term we could or should associate ourselves with, whether we agreed with the values of humanism, our opinions on religion.

We discussed it for days, on IRC and in comments and edits on a Google Document (I don't want to advertise Google particularly, but Google Documents really are an amazingly effective way to collaborate with people). It got kinda heated at times. But eventually we came out with a largely agreed upon statement of values, and I think our individual values all changed a little along the way.

The statement of values

I am proud of what we produced, and I had a lot of fun doing it. I think it sums up my values rather well. I think it's firm and clear without being offensive or inflamatory. I'd love to know what you think of it - please let me know in the comments.

It can be seen on the blackgate foundation website or in our GitHub repositority, but I'm also reproducing it here in its current form (we may decide to change it in the future):

Statement of values

We, the members of the Blackgate Foundation, value:

Equality

  • Humanity should strive to treat and provide for all people equally regardless of appearance, sexuality, gender, beliefs, ability or actions.
  • All people should be equally represented and no person fundamentally deserves to be better off than any other.

Science & openness

  • The pursuit of knowledge is a human instinct and a universal force for good.
  • There is value in sceptical, evidence-based and objective reasoning in the persuit of knowledge.
  • Knowledge should be made available to all of humanity. We should strive to build on existing work rather than doing work from scratch.
  • There is value in open processes and collective decision making - many eyes guard against injustices and inefficiencies.

Diversity

  • Diversity is important in all things. Many opinions and diverse practices prevent stagnation, create risilience through redundancy and speed evolution and learning.
  • Centres of control should be diverse and small and subservient and answerable to all over whom they hold influence. Any decisions by such centres should be evidence based and open to discussion.
  • The interests of humanity should always come before those of any individual or group, particularly applies to corporate protectionism and nationalism.

Pacifism

  • Violence in all its forms is divisive and inflammatory and therefore always undesirable.
  • We renounce the glorification of violence and the use of violence to solve disputes.
  • It is in the interest of humanity to seek to understand and help those who act violently.

Evidence-based morality

  • Morality is not absolute. Moral guidelines should be formed through evidence-based reasoning.
  • There exist solid evidence-based arguments for the most universally accepted moral tenets.
  • "Bad" and "evil" are counter-productive concepts. Humanity should strive to avoid ultimately judging any person as either.

Sustainability

  • All human activity should continually strive to be sustainable. Notable examples are human impact on the environment and the global economy.

Try it!

Why don't you try writing down your morals and values in a similar form? Or do it with some friends? I really enjoyed it and couldn't recommend it more.

Read more

Static site generators (like Jekyll and Hyde) offer a much simpler and more transparent way to create a website. There's a small learning curve, but it's totally worth it. Especially if you're a developer already.

What is a static site generator?

A piece of software that can read a set of files in a particular format and convert them into static files (e.g. HTML &c.) that can then be served directly as a website.

Note that just because a site is static on the server-side doesn't mean it can't be dynamic on the client-side. You can easily include comments and other dynamic functionality through JavaScript plugins.

The workflow goes something like this:

$ sublime-text _posts/2013-05-30-why-i-love-the-internet.md # create a new blog post
$ jekyll --server # build the static site into my _site/2013/05/30/why-i-love-the-internet.html directory and run a test server
# check the site and my new blog post look okay
$ git add . && git commit -m 'new post: why i love the internet' # save it in version control
$ git push heroku # release the change to my live site (I use heroku)

Why bother?

Personally I think static sites make managing websites really fun.

For the right kind of project, static sites can make it so much simpler to manage a site. They remove a whole bunch of concerns that you used to have to worry about (e.g. with CMSs like Wordpress or Drupal, or frameworks like Django, Rails or Symfony):

  • caching - You can forget about server-side caching, since you're already serving static files
  • databases - You don't need a database - all the data is stored as files
  • version control - You easily keep your whole site including document changes in version control
  • easy to start - Hardly have to write any code to get started.
  • easy to maintain - Tweaking your site is more transparent and direct - you can easily view and edit the static files directly.

Which sites make sense?

Any site that needs to do anything complex on the server-side work will not be appropriate. However, any site which is basically just a collection of static information - like a blog, a brochure site, or even a news or magazine site - could work as a static site.

The other important thing is that everyone who wants to be able to edit the site needs to learn how to do it.

This needn't necessarily exclude anyone. Many static site generators use Markdown document syntax, which anyone can understand and learn. Github even has a lets you edit files directly online, which anyone with permission can use to edit the website files. Editors will have to understand the concept of version control, and understand how the site structure works, but this shared understanding will probably aid rather than hinder your project's progression.

In any case, if the only people who edit the site directly are developers then using a static site generator should come absolutely naturally.

How?

There are many static site generators out there written in many different languages:

Personally I use jekyll for my website. Originally this was because it is natively supported in Github Pages.

I'm not going to go into how to use a Jekyll in depth in this post, but I'll try to write another couple of posts soon:

  1. How to set up a basic static site with Jekyll on Github Pages
  2. How to host a Jekyll-based site on Heroku

Read more

I have many interests, but I think there are two common thread running through them all:

  • I care deeply, fundamentally about fairness and equality
  • I am very interested in complex systems

"Complex systems" sounds extremely abstract, but I think it really is the core of my academic interest. I like mapping systems in my head, seeing the nodes; seeing the ways they interact with each other. I like working out how to create elegant systems and optimal systemic solutions for solving problems.

This leads me in two directions:

  1. I love technology. Technology, along with all the problems it's trying to solve, creates and makes use of myriad systems and systemic structures. I love trying to understanding these systems.
  2. I love social systems and social science. People are complex, and there are extremely subtle and nuanced rules governing how they think interact in a social systems. I love pondering people and psychology.

Running through all the mini projects and fancies that flow from my interest in systems is my deep desire for global fairness and equality. I believe that technology has the capacity to be a great equaliser. Most people in the world don't really have a voice to influence the global power-structures, but hopefully the internet and communications technology can give them that voice.

In a nutshell, this is why I love the internet.

Read more

Chrome version 25 appears to have made a pretty serious change to how the HTML5 input date type is rendered.

Now the date type defaults to display: -webkit-inline-flex, and (this is the bad bit) if you use display: block the layout breaks:

date field layout broken

(try it yourself)

Why is this bad?

We use the date type on arena blinds, and to have more control over the layout of the input fields, they are all set to display: block. I think this is, if not "best", at least a pretty common practice.

So one day we realised our date fields looked broken in Chrome, and it was because of this issue. So my boss said:

If we can't rely on the date control not to break, we have to abandon the HTML5 date field altogether

And that's entirely fair reasoning.

Cognitive dissonance

My boss's perfectly reasonable conclusion goes against everything progressive that I've been trying to instil in my team.

Progressive enhancement is accepted best practice nowadays - to use the built-in functionality when it's there, with fall-backs for browsers that don't support it. E.g.:

if (!Modernizr.inputtypes['date']) {
    $('input[type=date]').datepicker();
}

This is a solid approach I strongly believe in. But if Chrome are going to implement breaking changes like this, I don't know what to think any more.

Chrome, you've ruined my day.

Read more

I was just entering an expense on Splitwise and I noticed a subtle little widget in the bottom of the screen saying "Feedback". "Aha!" thinks me. "This is exactly the sort of thing all websites should do". So I click it and find out it's made by Uservoice.

Websites need user-feedback. They need it all the time. So we need to be constantly offering users the opportunity to tell us what they think, but also not annoy users by bugging them all the time, and somehow try to avoid getting 50 of the same issue being submitted.

Well done Uservoice

I think Uservoice got this exactly right. You get a subtle link appear on the side of the site saying one word - "feedback". You probably noticed it, instantly know what it is there for, and it's easy to ignore if you want.

When you click it, you get given a list of current suggestions on the left that you can vote on, or you can submit your own suggestion on the right. It's perfect.

The feedback link is totally customisable, and easy to include in your site with a simple Javascript snippet.

I use the service on this very site (look to the right). Please click it to see Uservoice in action and please leave me some feedback :).

The missing link - Github Issues integration

Immediately I thought "where will these suggestions be stored?" because I was already managing my own list of ideas in Github Issues (augmented with Huboard) and I didn't like the idea of having to maintain two lists, or manually copy issues between the two.

Someone had already suggested integration to Uservoice, but it turns out there's already a slick solution with Zapier.

Zapier is an integration service - for linking various different APIs. And they already have built-in support for linking Uservoice to Github Issues.

But how much does it cost?

For this website I certainly can't afford to pay for either service. So it's a good thing that both Zapier and Uservoice follow a similar model to other modern digital projects like Heroku. That is - it's free for light or personal use, but when you want to scale it you have to start paying.

Which suits me just fine.

Read more

Try visiting this site in IE8. Go on, I dare ya. Alright, I'll tell you - it's an ugly white page with black writing. Oh except for a banner at the top telling you to upgrade your browser.

In recent years we have said goodbye to widespread support for first IE6 and then IE7.

Google dropped support for IE8 back in November, 37signals also. There are a plethora of articles out there imploring people to drop support for Internet Explorer.

IE8 usage

According the theie8countdown.com, global usage is at 24%. On this blog, it's at 1.5%, and on my company's website, Arena Blinds, (used by much less tech-savvy people) it's at 15%.

So if you were bold (like this site), you could probably drop support completely and effect less than 1/5 of your visitors. And those visitors would quickly upgrade their browsers.

Advantages

Dropping comes with significant advantages:

These four points will effect your debugging time for front-end development dramatically.

Consider it.

Read more