Canonical Voices

What the blog of robin talks about

I was just having a discussion with my friends about If I Were You adding an extra pre-roll advertisement to their latest podcast, and it inspired me to write about my moral opinion of advertising in general.

Selling consumers

By choosing to add an advertisement to a magazine article, TV show or podcast, the content creator is choosing to sell a portion of their audience's attention. The audience has devoted their time to watch the actual content, but they are instead subjected to watching an advertisement for a random product.

Now you could argue that everyone who watches any media with ads knows that this is the deal. They are choosing to watch the show, knowing it is ad-supported, so they should be allowed to make that choice. Where's the harm?

My problem with it is the insidious effect that it has on that audience, and society at large. The advertising space is up for sale, often simply to the highest bidder. That means that whoever is willing to pay the most gets to subtly manipulate that audience. Are all those audience members aware that that's what they're signing up for? And even if they are, what about the wider effect on society?

Advertising contributes hugely to obesity, the most serious health problem facing western nations, eating disorders and other psychological problems.

Societal capture

While it is true (and a great thing) that we are all becoming wiser to the tricks of advertisers, adverts still carry a huge amount of power. We all know that campaign finance for US election races basically decides the outcome. If you can spend billions on your campaign adverts, you will almost certainly win.

While possibly not quite as harmful as campaign adverts, I believe the same theory applies to advertising at large. The biggest companies can afford to buy more of these random advertising slots than anyone else, and it has a huge effect on society. Is there anyone who hasn't heard of Coke or McDonalds? How many women don't feel a constant pressure to look slim and beautiful? And this advertising also helps the massive corporations keep their monopolies.

Society is genuinely shaped by the media, and the media is made up of a huge amount of advertisements. This means that the corporations with the most money get to shape society in a way that suits them. And that model for society is always based on bigger profits for those companies, not the interests of society.

If there were fewer media spots up for sale, I believe the whole of society would benefit immesurably.

Advertising is a major culprit in runaway climate change

The biggest and most obvious problem is that advertising, beyond a shadow of a doubt, fuels consumerism and therefore over-consumption. And this consumerism is terribly bad for the climate - the number one danger facing humanity. We are at a point where developed nations are producing emissions at a catastrophic rate. And there's no one culprit - our societies are simply structured to be wasteful. We consume more food than we need, and buy a lot more than we consume. We all fly all over the planet all the time. We buy new clothes, and throw out old ones, far more often than we need to.

And all of this is because big corporations, who are solely interested in us continuing to consume in ever greater quantities, get to be constantly manipulating everyone within society with their money through paid advertisments.

Financing without ads

The problem is, so many free services that we currently enjoy would simply not exist without ads. Most of the digital services we rely on are entirely ad-sponsored (Facebook, Google and Bing's myriad services, Twitter, Youtube). To be fair, Google have worked to make ads a bit less intrusive, and I do think that's a good thing, but it's not like the corporate influence on society seems to have reduced at all since 1998.

If advertising were somehow less profitable, or just too morally odious to justify, then these digital services would have to be based on considerably different profit models, and they may well not exist at all. The obvious alternate model is to simply charge directly for these services, but only a tiny fraction of the people who use these services today would have signed up to pay even a small amount for them. I can't pretend this isn't a difficult problem.

I would genuinely like to see more companies try different profit models. For example, Github provide a full free service for open-source work, but charge for privacy, Humble Bundle let you "pay what you like" for content, and Wikipedia are financed purely through donations.

I also believe that if more companies were more honest and open with their finances, the fans would be more happy to help out by paying donations or subscriptions.

Ethical advertising

Okay, let's be honest, advertising isn't going anywhere. But I still hope that we can try to limit the damage by requiring content creators to be more ethical with their advertising.

I think any advert on any website, TV show, magazine article or whatever should be considered an endorsement. Any criticisms leveled against the advert or the company that made the advert should also be applied to the organisation that chose to give the advertisement air-time. This does happen to some extent (e.g. the This World advert in the Guardian), but I think it should happen more. This would hopefully force organisations to take more ethical responsibility over who they sell advertising space to, which would do a world of good.

It would also be nice if content-creators were choosing adverts, rather than the media company that distributes the content - e.g. adverts in the breaks in the middle of TV shows should be chosen by the TV show authors. This would mean that the fans of the show would at least be watching adverts that the creator chose.

Installing Ad-Block

Some think it's un-ethical to install Ad-Block, as then you are potentially depriving the good content-creators of their revenue.

Given my ethical position on ads, I disagree with this. I think that one of the ways people can help to shape society for the better is to deliberately (and hopefully, vocally) reject things they find obnoxious. Therefore, the very existence of Ad-Block, and the number of people who have installed it, are a statement in opposition to ad-based financing models. And I hope that it might have some small effect in discouraging organisations from choosing to go that way.

Read more

Following are my long-form notes for a short presentation I gave to the team here at Canonical.


We are all aware that the Internet is truly today's information superhighway.

So much of the world's information today is written in HTML that it's almost synonymous with "information".

HTML is the basic component of the Internet. We all use the Internet. If you take away CSS and JavaScript, you're left with just a whole bunch of HTML.

Understanding the interplay between markup and the Internet is important for anyone who writes content for the Internet

Simplicity and accessibility

Openness

We write JavaScript, CSS and back-end code for simplicity and clarity just so other developers, and probably only developer in our team can easily read and work on the code.

HTML is always the most public and central part of all our information, it is the most import thing to make as simple and intuitive as possible. Our HTML might be downloaded, viewed or hacked around with by anyone. They don't need to be a developer by trade. Anyone who knows how to "view source" can read our markup. Anyone who knows how to click "save web page" can hack around it.

Good writing

I'd like to suggest that anyone who writes professionally, in today's world, should have some understanding of how markup works.

People in more and more areas have to write markup sometimes. Anyone who writes blogs in Wordpress has probably had to edit the raw markup at some point. But also, anyone who writes in any medium that might be converted into markup at any point in the future should be aware of some of the ways it works.

I would therefore posit that using the correct tag to markup your information is as important as choosing how to layout your word document (headings, bullet-points etc.).

If you're ever writing markup, go and familiarise yourself with the HTML extensions in HTML5. And if you have something new to markup (e.g. a pull-quote, a code-block or a graph) give it a Google, see what best practice is.

Accessibility

A tempting attitude to take to writing markup is to focus on the average user, or maybe at least users within the inter-quartile range. If you look at Google Analytics, you will see that almost all visits to our sites are from people with modern, HTML5 & ECMAScript 5 capable browsers. As long as things look good on that setup, it's not so important to cover the edge-cases.

I would say that there are likely many flaws in this analysis. One is that maybe instead of hurting 1% of people by not worrying about the edge-cases, we're hurting 50% of the people, 2% of the time. Which, in terms of public opinion, is worse.

For example, if I try to load a website on the train (which I do more often than most, but many people do occasionally), there is a high likelihood that my connection will drop half-way through and I'll get a partially loaded page. At this point, since I will have downloaded the markup first, it is paramount that the markup looks sensible and contains all the relevant information.

Fortunately, there's a simple formula - if you understand the basic components for the web and write in them as simply and straightforwardly as you can, first, then most things will just work.

One of the beautiful things about the web is it's actually impossible to predict exactly how people are going to want to use it. But simplicity and directness are your friends.

Referencing

The Internet is a collection of links. The real genius of HTML is its extremely light referencing system.

Referencing has been a core component of scientific work forever, but HTML and the Internet bring that scientific process to into the commons.

Not only that, but the whole structure of the Internet depends on references. Good linking makes documents more understandable - it's easy to follow a link to find out more about a base concept you don't properly understand.

People follow links to discover new content, but more importantly, search engines use these links to find new content and to categorise it for searching. The quantity, specificity and wording of your links contribute to the strength of the Internet.

This is where an understanding matters not just to people who write in HTML, but anyone who writes content for the Internet.

When you're writing, especially if you're explaining a concept, if ever you use a term which you think could be described in more depth, find a link for it. People will thank you.

Rather than just adding the full link into the page's text (e.g. "see: www.example.com"), or writing "click here", add the link to a relevant part of your sentence. This is important because search engines will use your link text to help describe what that link is about.

It's also helpful if your link text is not exactly the same as simply the title of the post you're linking to. This is because it's helpful for that page to be described in many different ways, organically, by people linking to it.

IDs and anchors

Your readers will thank you for specific linking. If the topic you're trying to cover with your link is under a sub-heading half way down the document, see if you can find an anchor which will take them straight there (example.com#heading3).

On the development side, I believe that responsible HTML will contain IDs for this reason. Each heading, sub-heading or useful document section should ideally have an ID set on it, so people can link directly to that section if they need to.

Thank you

You're not going to do most of what I've said above, most of the time. But I think just keeping it in mind will make a difference. Learning how to write responsibly for the web is a creative and infinite journey. But every time you publish anything, and even better if you make an extra link or find a new more specific markup tag, you're strengthening the Internet. Thank you.

Read more

(Also posted on the Canonical blog)

On 10th September 2014, Canonical are joining in with Internet Slowdown day to support the fight for net neutrality.

Along with Reddit, Tumblr, Boing Boing, Kickstarter and many more sites, we will be sporting banners on our main sites, www.ubuntu.com and www.canonical.com.

Net neutrality

From Wikipedia:

Net neutrality is the principle that Internet service providers and governments should treat all data on the Internet equally, not discriminating or charging differentially by user, content, site, platform, application, type of attached equipment, and modes of communication.

Internet Slowdown day

#InternetSlowdown day is in protest to the FCC’s plans to allow ISPs in America to offer “paid prioritization” of their traffic to certain companies.

If large companies were allowed to pay ISPs to prioritise their traffic, it would be much harder for competing companies to enter the market, effectively giving large corporations a greater monopoly.

I believe that internet service providers should conform to common carrier laws where the carrier is required to provide service to the general public without discrimination.

If you too support net neutrality, please consider signing the Battle for the net petition.

Read more

(This article is was originally posted on design.canonical.com)

On release day we can get up to 8,000 requests a second to ubuntu.com from people trying to download the new release. In fact, last October (13.10) was the first release day in a long time that the site didn't crash under the load at some point during the day (huge credit to the infrastructure team).

Ubuntu.com has been running on Drupal, but we've been gradually migrating it to a more bespoke Django based system. In March we started work on migrating the download section in time for the release of Trusty Tahr. This was a prime opportunity to look for ways to reduce some of the load on the servers.

Choosing geolocated download mirrors is hard work for an application

When someone downloads Ubuntu from ubuntu.com (on a thank-you page), they are actually sent to one of the 300 or so mirror sites that's nearby.

To pick a mirror for the user, the application has to:

  1. Decide from the client's IP address what country they're in
  2. Get the list of mirrors and find the ones that are in their country
  3. Randomly pick them a mirror, while sending more people to mirrors with higher bandwidth

This process is by far the most intensive operation on the whole site, not because these tasks are particularly complicated in themselves, but because this needs to be done for each and every user - potentially 8,000 a second while every other page on the site can be aggressively cached to prevent most requests from hitting the application itself.

For the site to be able to handle this load, we'd need to load-balance requests across perhaps 40 VMs.

Can everything be done client-side?

Our first thought was to embed the entire mirror list in the thank-you page and use JavaScript in the users' browsers to select an appropriate mirror. This would drastically reduce the load on the application, because the download page would then be effectively static and cache-able like every other page.

The only way to reliably get the user's location client-side is with the geolocation API, which is only supported by 85% of users' browsers. Another slight issue is that the user has to give permission before they could be assigned a mirror, which would slightly hinder their experience.

This solution would inconvenience users just a bit too much. So we found a trade-off:

A mixed solution - Apache geolocation

mod_geoip2 for Apache can apply server rules based on a user's location and is much faster than doing geolocation at the application level. This means that we can use Apache to send users to a country-specific version of the download page (e.g. the German desktop thank-you page) by adding &country=GB to the end of the URL.

These country specific pages contain the list of mirrors for that country, and each one can now be cached, vastly reducing the load on the server. Client-side JavaScript randomly selects a mirror for the user, weighted by the bandwidth of each mirror, and kicks off their download, without the need for client-side geolocation support.

This solution was successfully implemented shortly before the release of Trusty Tahr.

Read more

Docker is a fantastic tool for running virtual images and managing light Linux containers extremely quickly.

One thing this has been very useful for in my job at Canonical is quickly running older versions of Ubuntu - for example to test how to install specific packages on Precise when I'm running Trusty.

Installing Docker

The simplest way to install Docker on Ubuntu is using the automatic script:

curl -sSL https://get.docker.io/ubuntu/ | sudo sh

You may then want to authorise your user to run Docker directly (as opposed to using sudo) by adding yourself to the docker group:

sudo gpasswd -a [YOUR-USERNAME] docker

You need to log out and back in again before this will take effect.

Spinning up an old version of Ubuntu

With docker installed, you should be able to run it as follows. The below example is for Ubuntu Precise, but you can replace "precise" with any available ubuntu version:

mkdir share  # Shared folder with docker image - optional
docker run -v `pwd`/share:/share -i -t ubuntu:precise /bin/bash  # Run ubuntu, with a shared folder
root@cba49fae35ce:/#  # We're in!

The -v `pwd`/share:/share part mounts the local ./share/ folder at /share/ within the Docker instance, for easily sharing files with the host OS. Setting this up is optional, but might well be useful.

There are some import things to note:

  • This is a very stripped-down operating system. You are logged in as the root user, your home directory is the filesystem root (/), and very few packages are installed. Almost always, the first thing you'll want to run is apt-get update. You'll then almost certainly need to install a few packages before this instance will be of any use.
  • Every time you run the above command it will spin up a new instance of the Ubuntu image from scratch. If you log out, retrieving your current instance in that same state is complicated. So don't logout until you're done. Or learn about managing Docker containers.
  • In some cases, Docker will be unable to resolve DNS correctly, meaning that apt-get update will fail. In this case, follow the guide to fix DNS.

Read more

Fix Docker's DNS

Docker is really useful for a great many things - including, but not limited to, quickly testing older versions of Ubuntu. If you've not used it before, why not try out the online demo?.

Networking issues

Sometimes docker is unable to use the host OS's DNS resolver, resulting in a DNS resolve error within your Docker container:

$ sudo docker run -i -t ubuntu /bin/bash  # Start a docker container
root@0cca56c41dfe:/# apt-get update  # Try to Update apt from within the container
Err http://archive.ubuntu.com precise Release.gpg
Temporary failure resolving 'archive.ubuntu.com'  # DNS resolve failure
..
W: Some index files failed to download. They have been ignored, or old ones used instead.

How to fix it

We can fix this by explicitly telling Docker to use Google's DNS public server (8.8.8.8).

However, within some networks (for example, Canonical's London office) all public DNS will be blocked, so we should find and explicitly add the network's DNS server as a backup as well:

Get the address of your current DNS server

From the host OS, check the address of the DNS server you're using locally with nm-tool, e.g.:

$ nm-tool
...
  IPv4 Settings:
    Address:         192.168.100.154
    Prefix:          21 (255.255.248.0)
    Gateway:         192.168.100.101

    DNS:             192.168.100.101  # This is my DNS server address
...

Add your DNS server as a 2nd DNS server for Docker

Now open up the docker config file at /etc/default/docker, and update or replace the DOCKER_OPTS setting to add Google's DNS server first, but yours as a backup: --dns 8.8.8.8 --dns=[YOUR-DNS-SERVER]. E.g.:

# /etc/default/docker
# ...
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--dns 8.8.8.8 --dns 192.168.100.102"
# Google's DNS first ^, and ours ^ second

Restart Docker

sudo service docker restart

Success?

Hopefully, all should now be well:

$ sudo docker run -i -t ubuntu /bin/bash  # Start a docker container
root@0cca56c41dfe:/# apt-get update  # Try to Update apt from within the container
Get:1 http://archive.ubuntu.com precise Release.gpg [198 B]  # DNS resolves properly
...

Read more

If you glance up to the address bar, you will see that this post is being served securely. I've done this because I believe strongly in the importance of internet privacy, and I support the Reset The Net campaign to encrypt the web.

I've done this completely for free. Here's how:

Get a free certificate

StartSSL isn't the nicest website in the world to use. However, they will give you a free certificate without too much hassle. Click "Sign up" and follow the instructions.

Get an OpenShift Bronze account

Sign up to a RedHat OpenShift Bronze account. Although this account is free to use, as long as you only use one 1-3 gears, it does require you to provide card details.

Once you have an account, create a new application. On the application screen, open the list of domain aliases by clicking on the aliases link (might say "change"):

Application page - click on aliases

Edit your selected domain name and upload the certificate, chain file and private key. NB: Make sure you upload the chain file. If the chain file isn't uploaded initially it may not register later on.

Pushing your site

Now you can push any website to the created application and it should be securely hosted.

Given that you only get 1-3 gears for free, if you have a static site, it's more likely to handle high load. For instance, this site gets about 250 visitors a day and runs perfectly fine on the free resources from OpenShift.

Read more

Following are some guidelines about Agile philosophy that I wrote for my team back in September 2012.

I also wrote a popular StackExchange answer about Agile project planning which you might fine useful if you're thinking about implementing Agile.


Agile software development is a philosophy for managing software projects and teams. It has similarities to lean manufacturing principles for "eliminating waste".

The philosophy centers around the agile manifesto:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

  • Individuals and interactions over Processes and tools
  • Working software over Comprehensive documentation
  • Customer collaboration over Contract negotiation
  • Responding to change over Following a plan

That is, while there is value in the items on the right, we value the items on the left more.

Of the various software development methodologies out there, Scrum and Extreme programming particularly try to follow agile software development principles.

Lean software development is also rapidly gaining support within the agile community.

Agile practices and principles

Without choosing to follow any one defined methodology for project management, here are some common practices that could be adopted by an agile team:

Read more

I wrote this set of programming principles for my team to follow back in 2012. I'm sure there are many like it, but this one is mine. May you find it useful.

Writing code

Try to write expressive code.

Beware code bloat - adhere to the YAGNI principle

Practicing Behaviour-Driven Development can help with both of these aims.

Do less: Before writing a new piece of functionality, go and look for similary solutions that already exist and extend them.

Code architecture

Namespace your classes, and code to an interface (this is an implementation of the Design by Contract principle), and make your interfaces (both programming interfaces and user-interfaces) as simple as possible.

Try to learn and comply with all 5 principles of SOLID (watch this great video).

Learn as many Design Patterns as you can to inform your coding, but beware of implementing them blindly. Developers can be over-zealous in their use of Design Patterns and may end up over-engineering a solution.

Some useful design patterns:

Tools

Try to learn an IDE with advanced features. These can really save you a lot of time:

  • Syntax highlighting
  • Auto-complete for function, class and method names
  • Auto-formatting
  • Code navigation help - e.g. jump to class declaration
  • Collapsing of code blocks
  • Overviews of code, e.g. a list of all methods within a class
  • Debugging tools like break points

Some suggestions:

Read more

Luminous beings are we

A diary entry from October 15th 2013

Today I very much wanted to work on my voice. Work out how to get my message across - feel like I was saying something genuine, something of significance.

I like the idea of sketches. Particularly sketches about systems and networks. How everyone is connected, and human society grows like an organism, each little autonomous cell influencing each other one. We are like a neural network.

And I wanted to illustrate how these autonomous nodes make up an ebbing and flowing tide, with each individual or group potentially changing the direction of the tide. We are all connected, we all influence each other, we all have power to change the flow of the tide, but we also are swept along by it. I find this vision inspiring but not intimidating. Any one of us can be the instigator of a change of direction, but we are under no pressure to be.

Hmm. Some academics probably study sentient fluids.... Like traffic. That would be an interesting topic.

People grow and develop in this way too. We rush or stagnate through deliberate or accidental events. We are none of us ultimately in control. I believe this absolves any one person of too much responsibility, but at the same time we are all responsible. I wish I could communicate this idea succinctly. I hope a vision like this can lead to people judging each other less. It's hard to explain how.

I think this is like a hacker's vision. There are endless possibilities for this organism. No-one knows where it will go. There is no defined end-goal. We are constantly discovering. Every individual life is a unique exploration. There can be no higher goal than to explore, finding solutions and perspectives that are unique, continuing the exploration.

This is hacking - life is hacking.

But somehow I feel like I'm letting down this purpose. I am not exploring as much as I could be. I'm somewhat stagnating. I'd like to be inspiring people, and communicating my thoughts and ideas honestly. I certainly feel like I have thoughts and ideas, unique perspectives, and my current job and my current lifestyle are not realising one tenth of them. How to solve this?

That'll do for now. Goodnight, diary.

Read more

Writing expressive code

As any coder gains experience, they inevitebly learn more and more ways to solve the same problem.

The very first consideration is simplicity. We probably want to use as simple and direct a solution as possible - to avoid over-engineering. But the simplest solution is not necessarily the shortest solution.

After simplicity, the very next consideration should be expressiveness. You should always be thinking about how deeply a new developer is going to have to delve into your code to understand what's going on.

Code is poetry

Writing expressive code may help future coders to understand what's going on. It may even help you in the future. But it may also help you simply to understand the problem. Thinking carefully about how to define and encapsulate the components of your solution will often help you to understand the problem better, leading to a more logical solution.

"Self-documenting code"

"Self-documenting code" is about structuring your code and choosing your method and variable names so that your code will be largely self-describing. This is a great practice, and can make some comments redundant:

$user = new User(); // create a new user object
$user->loadFromSession(session); // update the user from the session
if ($user->isAuthenticated()) { ... } // If the user is authenticated...

However, as a recent discussion with a friend of mine highlighted to me, expressive code is not a replacement for comments - no code is entirely "self-documenting". Always write as expresively as you can, but also always document where it makes sense. Methods, functions and classes should always be summarised with a comment - as mentioned in the Python coding conventions.

Wording

It's worth thinking carefully about how you name your variables and methods.

Don't abbreviate

var uid = 10; // I am unlikely to know what uid stands for without context
var userIdentifier = 10; // Better

Be specific

Use as concrete and specific nouns as you can to describe methods and functions:

var event; // bad - generic
var newsLinkClickEvent; // good - specific

Encapsulation

No-one likes to read a really long procedural program. It's very difficult to follow. It's much easier to read a shorter set of well-encapsulated method calls. If you need to delve deeper, simply look in the relevant method:

// Instead of showing you all the details of how we update the user
// We encapsulate that in the updateDetails method
// allowing you to quickly see the top-level processes
function saveUserDetails(userStore, userDetails) {
    var user = new User();
    user.updateDetails(userDetails); // sets a whole bunch of details on the user
    userStore.save(user); // Converts user data into the correct format, and then saves it in the user store
}

Do you need an else?

The use of many if .. else conditionals make programs confusing. In many cases, the else part can be encapsulated in a separate method or function call, making the program easier to read:

// With the else
if (user.permissionGroup == 'administrator') {
    article.delete();
} else {
    page.showError("Sorry you don't have permission to delete this article");
}
// Without the else
if (!user.deleteArticle(article)) {
    page.showError("Sorry you don't have permission to delete this article");
}

In cases where a switch is used, or multiple if .. else if statements, you could consider using different types instead:

class User {
    function deleteArticle($article) {
        $success = false;

        if (
            user->permissionGroup == 'administrator'
            || user->permissionGroup == 'editor'
        ) {
            $success = $article->delete();
        }

        return $success;
    }
}

You can remove the need for this if, by making special types:

trait ArticleDeletion {
    function deleteArticle($article) {
        return $article->delete();
    }
}

class Editor implements User { use ArticleDeletion; }
class Administrator implements User { use ArticleDeletion; }

Notice that I've deliberately opted not to make Administrator inherit from Editor, but instead compose them separately. This keeps my structure more flat and flexible. This is an example of composition over inheritence.

Depth

While encapsulation is often a good thing, to make programs easier to understand at the higher level, it's important to preserve the single responsibility principle by not encapsulating separate concerns together.

For example, one could write:

var user = new User();
user.UpdateFromForm(); // Imports user data from the page form
user.SaveToDatabase();

While this is both short and fairly clear, it suffers from two other problems:

  • The user has to delve further into the code to find basic information, like the name of the Database class, or which form the details are stored in
  • If we want to use a different instance of the Database, we have to edit the User class, which doesn't make a whole lot of sense.

In general you should always pass objects around, rather than instantiating them inside each other:

var user = new User();
var userData = Request.Form;
var database = new DatabaseManager();

user.ImportData(userData);
database.Save(user);

This is more lines, but it is nonetheless clearer what is actually happening, and it's more versatile.

Tidiness

Always try to format your code so that it is easily readable. Don't be afraid of white space, and use indentation sensibly to highlight the structure of your code.

Where there is an accepted code style guide, you should try to follow it. For example, PHP has the FIG standards.

However, I don't think it's worthwhile being overly anal about code standards (my thinking has evolved on this somewhat) because you'll never be able to get everybody to code exactly the same way. So if (like me) you're a coder who feels the need to reformat code whenever you see it to make it fit in with anal standards, you could probably so with training yourself out of that habit. As long as you can read it, leave it be.

Delete commented out code

If you're using a version control system (like Git) there really is no need to keep large blocks of commented-out or unused code. You should just delete it, to keep your codebase tidier. If you really need it again, you can just got and find it in the version control history.

Trade-offs

There will always be a trade-off between expresiveness and succinctness.

Depth vs. encapsulation

It is desirible to keep as flat a structure as possible in your objects, so that programmers don't have to delve through parent class after parent class to find the relevant bit of code. But it is also important to keep code encapsulated in logical units.

Both the goals are often achievable by doing composition over inheritence using dependency injection or traits / multiple inheritence.

Special syntax

In many languages there are often slightly obscure constructs that can nonetheless save time. With many of these there is a readability vs. simplicity trade-off.

Ternary operators and null coalescing

Both C# and PHP have null coalescing operators:

var userType = user.Type ?? defaultType; // C#
$userType = $user->Type ?: $defaultType; // PHP

And almost all languages support the ternary operator:

var userType = user.Type != null ? user.Type : defaultType;

Both of these constructs are much more succinct than a full if .. else construct, but they are less semantically clear, hence the trade-off. Personally, I think it's fine to use the ternary operator in simple conditionals like this, but if it gets any more complicated then you should always use a full if .. else statement.

Plugins / libraries

For example, in C#:

var brownFish;

foreach (var fish in fishes) {
    if (fish.colour == "brown") {
        brownFish = fish;
        break;
    }
}

Can be simplified with the Linq library:

using System.Linq;

var brownFish = fishes.First(fish => fish.colour == "brown");

The latter is clearly simpler, and hopefully not too difficult to understand, but its does require:

  1. Knowledge of the Linq library
  2. An understanding of lambda expressions work

I think that in this case the Linq solution is so much simpler and quite expressive enough that it should definitely be preferred - and hopefully if another developer doesn't know about Linq, it will be quite easy for them to pick up, and will expand their knowledge.

Single-use variables

While the following variable is pointless:

var arrayLength = myArray.length;

for (var arrayIterator; arrayIterator < arrayLength; arrayIterator++) { ... }

There are some cases where variables can be used to add useful semantic meaning:

var slideshowContainer = jQuery('main>.show');

slideshowContainer.startSlideshow();

Read more

In the last couple of months I've had a number of discussions with people who were under the impression that encryption has been cracked by the NSA.

If you like, jump straight to what you can do about it.

The story

The story started in September, in the Guardian:

NSA and GCHQ unlock encryption used to protect emails, banking and medical records

(Guardian - Revealed: how US and UK spy agencies defeat internet privacy and security, James Ball, Julian Borger and Glenn Greenwald, 5th September 2013)

This came up again today, because Sir Tim Berners-Lee made a statement:

In an interview with the Guardian, he expressed particular outrage that GCHQ and the NSA had weakened online security by cracking much of the online encryption on which hundreds of millions of users rely to guard data privacy.

(Guardian - Tim Berners-Lee condemns spy agencies as heads face MPs, Ed Pilkington, 7th November 2013)

And something very similar to this was stated in the Radio 4 news program I was listening to this morning.

The worry

On the face of it this sounds like the NSA's geniuses have reverse-engineered some core cryptographic principles - e.g. worked out how to quickly deduce prime factors from a public key (read an explanation of RSA).

This would be very serious. I was sceptical though, because I believe that if there were key vulnerabilities in public algorithms, the public would have found them long before the NSA. They don't have a monopoly on good mathematicians. This is, after all, why open-source code and public algorithms are inherently more secure.

The truth

Helpfully, Massachusetts Institute of Technology published an article 4 days later clarifying what the NSA had likely achieved:

New details of the NSA’s capabilities suggest encryption can still be trusted. But more effort is needed to fix problems with how it is used.

(NSA Leak Leaves Crypto-Math Intact but Highlights Known Workarounds, Tom Simonite, 9th September 2013)

This shows that (still as far as we know) the NSA have done nothing unprecedented. They have, however, gone to huge lengths to exploit every known vulnerability in security systems, regardless of legality. Mostly, these vulnerabilities are with the end-point systems, not the cryptography itself.

What the NSA and GCHQ have done

I've tried to list these in order of severity:

  • Intercepted huge amounts of encrypted and unencrypted internet traffic
  • Used network taps to get hold of Google and Yahoo's (and probably others') unencrypted private data as it's transferred between their servers
  • Acquired private-keys wherever they can, presumably through traditional hacking methods like brute-forcing passwords, social engineering, or inside contacts.
  • Built back doors into certain commercial encryption software products (most notably, Microsoft)
  • Used brute-force attacks to find weaker (1024-bit) RSA private keys
  • Used court orders to force companies to give up personal information

A word about RSA brute-forcing

We have known for a while that 1024-bit RSA keys could feasibly be brute-forced by anyone with enough resources - and many assumed that the U.S security agencies would almost certainly be doing it. So for the more paranoid among us, this should be no surprise.

“RSA 1024 is entirely too weak to be used anywhere with any confidence in its security” says Tom Ritter

However, MIT also claim that these weaker keys are:

used by most websites that offer secure SSL connections

This surprises me, as I know that GoDaddy at least won't sell you a certificate for a key shorter than 2048-bit - and I would assume other certificate vendors would follow suit. But maybe this is fairly recent.

However, even if "most websites" use RSA-1024, it doesn't mean that the NSA is decrypting all of this encrypted traffic, because it still requires a huge amount of resources (and time) to do, and the sheer number of such keys being used will also be huge. This means the NSA can only be decrypting data from specifically targeted sites. They won't have decrypted all of it.

What you can do

Now that we know this is going on, it only means that we should be more stringent about the security best-practices that already existed:

  • Use only public, open-source, tried and tested programs and algorithms
  • Use 2048-bit or longer RSA keys
  • Configure secure servers to prefer "perfect forward secrecy" cyphers
  • Avoid the mainstream service providers (Google, Yahoo, Microsoft) where you can
  • Secure your end-points: disable your root login; use secure passwords; know who has access to your private keys

Read more

On Saturday night, there was a big fight outside one of our night-clubs here in Nottingham, in which 3 people were stabbed.

BBC publishing stupid opinions

The BBC wrote an article, including a quote from the nightclub owner:

This is not a localised problem, knife crime is becoming a huge national issue Community sentences and conditional discharges do nothing to discourage criminals

and the pull-quote:

Tougher sentences needed

I don't understand why the BBC felt the need to give a platform to this particular schmuck. It is the responsibility of journalists, in my opinion, to stem the tide of sensationalism after events like this - after all, they should understand better than anyone the frequency with which stories like this occur.

The truth about knife crime

According to knife crime statistics from parliament.uk:

The number of knife offences recorded (during the year to June 2012) was 9% lower than in the preceding year.

NHS data suggests there were 4,490 people admitted to English hospitals in 2011/12 due to assault by a sharp object. The lowest level since 2002/03.

Similarly, the Office for National Statistics has stats showing that total knife-related offences in the year to March 2013 is 26,336, down from 31,147 the previous year.

So, Knife crime is not "becoming" any kind of problem. It's an old problem, but it's improving. So shut-up Simon Raine.

Also, I don't believe "tougher custodial sentences" have ever been the best solution. I don't have time to find the evidence now, but I believe custodial sentences only harden criminals, and that rehabilitation is the way forward. And the police and the justice system are slowly realising this - which may be partly helping the knife crime stats. Don't let stupid opinions like these derail that effort.

Read more

If you want to a tool to crawl through your site looking for 404 or 500 errors, there are online tools (e.g. The W3C's online link checker), browser plugins for Firefox and Chrome, or windows programs like Xenu's Link Sleuth.

A unix link checker

Today I found linkchecker - available as a unix command-line program (although it also has a GUI or a web interface).

Install the command-line tool

You can install the command-line tool simply on Ubuntu:

sudo apt-get install linkchecker

Using linkchecker

Like any good command-line program, it has a manual page, but it can be a bit daunting to read, so I give some shortcuts below.

By default, linkchecker will give you a lot of warnings. It'll warn you for any links that result in 301s, as well as all 404s, timeouts, etc., as well as giving you status updates every second or so.

Robots.txt

linkchecker will not crawl a website that is disallowed by a robots.txt file, and there's no way to override that. The solution is to change the robots.txt file to allow linkchecker through:

User-Agent: *
Disallow: /
User-Agent: LinkChecker
Allow: /

Redirecting output

linkchecker seems to be expecting you to redirect its output to a file. If you do so, it will only put the actual warnings and errors in the file, and report status to the command-line:

$ linkchecker http://example.com > siteerrors.log
35 URLs active,     0 URLs queued, 13873 URLs checked, runtime 1 hour, 51 minutes

Timeout

If you're testing a development site, it's quite likely it will be fairly slow to respond and linkchecker may experience many timeouts, so you probably want to up that timeout time:

$ linkchecker --timeout=300 http://example.com > siteerrors.log

Ignore warnings

I don't know about you, but the sites I work on have loads of errors. I want to find 404s and 50*s before I worry about redirect warnings.

$ linkchecker --timeout=300 --no-warnings http://example.com > siteerrors.log

Output type

The default text output is fairly verbose. For easy readability, you probably want the logging to be in CSV format:

$ linkchecker --timeout=300 --no-warnings -ocsv http://example.com > siteerrors.csv

Other options

If you find and fix all your basic 404 and 50* errors, you might then want to turn warnings back on (remove --no-warnings) and start using --check-html and --check-css.

Checking websites with OpenID (2014-04-17 update)

Today I had to use linkchecker to check a site which required authentication with Canonical's OpenID system. To do this, a StackOverflow answer helped me immensely.

I first accessed the site as normal with Chromium, opened the console window and dumped all the cookies that were set in that site:

> document.cookie
"__utmc="111111111"; pysid=1e53e0a04bf8e953c9156ea841e41157;"

I then saved these cookies in cookies.txt in a format that linkchecker will understand:

Host:example.com
Set-cookie: __utmc="111111111"
Set-cookie: pysid="1e53e0a04bf8e953c9156ea841e41157"

And included it in my linkchecker command with --cookiefile:

linkchecker --cookiefile=cookies.txt --timeout=300 --no-warnings -ocsv http://example.com > siteerrors.csv

Use it!

If you work on a website of any significant size, there are almost certainly dozens of broken links and other errors. Link checkers will crawl through the website checking each link for errors.

Link checking your website may seem obvious, but in my experience hardly any dev teams do it regularly.

You might well want to use linkchecker to do automated link checking! I haven't implemented this yet, but I'll try to let you know when I do.

Read more

SeeTheStats is a great free service for exposing your Google Analytics data (the only way to do Analytics) to the public.

Here is some information about my site:

How many people visit my site?

What country are they from?

What pages are they looking at?

What browsers are they using?

What operating systems are they using?

How big are their screens?

My SeeTheStats page

You can also see all these stats over at SeeTheStats.com.

Read more

With the advent of web fonts (e.g. from Google Fonts), thankfully web designers are no longer tied to a limited set of "web safe" fonts.

Fonts and performance

However, there is a potential performance hit with this. You will need to link your CSS files to the font files. The problem here isn't so much the size of the font file (they are typically under 100 KB), it's more that each new HTTP request that a page makes effects performance

Also, when loading web fonts externally you will sometimes see a flicker where the page loads initially with the default browser fonts, and then the new fonts are downloaded and applied afterwards. This flicker can look quite unprofessional.

Font formats and IE8

If you want to support Internet Explorer 8 or older, you unfortunately need to include your fonts in two formats: WOFF and EOT.

However, if you're willing to drop IE8 support (and reap the benefits), or to simply serve the browser default font to IE8, then you can provide your fonts in WOFF only, which is supported by all other relevant browsers.

Data URLs

So Data URLs, if you haven't heard of them, are a way of encoding binary data as a valid URL string. This means the data can be included directly inside HTML or CSS files. They are fantastically easy to create by simply dragging your binary file into the Data URL Creator.

Data URLs are likely to be a bit larger than the binary file would have been. In my experience they tend to be about 20% larger. So the larger the file you're dealing with the less practical it becomes to encode the file as a URL. However, for sub-100k web fonts this difference is not so important.

So using Data URLs, you can include your font directly in your CSS like so:

/* http://www.google.com/webfonts/specimen/Lato */
@font-face {
    font-family: 'Lato light';
    font-style: normal;
    font-weight: 300;
    src: local('Lato Light'), url('data:application/x-font-woff;base64,d09GRg...BQAAAAB'), format('woff');
}

(For example, here's what I use for this very site)

This will now mean that your web pages will only have to download one CSS file, rather than a CSS file and a bunch of font files, which will help performance. Personally I think it's also neat not to have to create a special directory for font files. Keeping it all in one place (CSS) just seems nice and neat to me.

A word about caching

Whether the above suggestion is actually a good idea will depend on how often your CSS changes. Hopefully you'll be merging your CSS files into one file already to reduce HTTP requests. This of course means that whenever that merged CSS file changes, your users will have to download the whole file again to see your changes.

If your fonts were downloaded as separate files, rather than being included in your CSS, then the fonts may well be cached even if the CSS has changed. However, if you include your fonts inside your CSS files as suggested above, this will mean that whenever your CSS changes a much larger CSS file will have to be downloaded each time. Including your fonts inside your CSS is likely to double the size of your CSS file.

This is a complex decision, but to give you some rough advice I'd say - if you CSS changes more than a couple of times a month then keep your fonts as separate files. If it's less often (as it is with this site) then it's probably worth including them inside the CSS as Data URLs.

If you have a different opinion on this, please let me know in the comments.

Read more

I am always thinking about good general rules for making the world a better place, but it's extremely difficult to succinctly communicate them to anyone.

This is the story of how my friends and I created and agreed on a statement of values.

The foundation

A couple of months ago, I was in an IRC chat room with some friends of mine (do people actually still use IRC? tell me in the comments), and @0atman aired an idea for a charitable project. We all thought it was a good one, and long a discussion ensued about the best way to run the project.

We all felt that it should be run democratically to some extent - that is, largely owned by its members - but we were worried about the project being hijacked and becoming something that none of us wanted it to be.

A potential solution, we felt, was to first create a foundation with exclusive membership and a solid stated set of values. That way, the project could be started by the foundation, but not inherently attached to it, meaning that if the project took a different direction, the foundation would remain intact. This would allow us to either create a fork of the project, bringing it back in line with our values, or start a completely new one, while allowing the existing project to continue in its new direction with our blessing.

Thus was formed the Blackgate Foundation.

(Nothing has come of the project idea yet. I hope it may in the future.)

Arguments over values

Since we formed the foundation specifically to be a solid moral centre for our future projects, the values of the foundation were paramount, so we started debating them in ernest.

Politically and morally we have a lot of things in common, but it was surprising how much we found to argue about. We disagreed about the necessity for punishment, whether there's ever a case to go to war, whether utilitarianism was a term we could or should associate ourselves with, whether we agreed with the values of humanism, our opinions on religion.

We discussed it for days, on IRC and in comments and edits on a Google Document (I don't want to advertise Google particularly, but Google Documents really are an amazingly effective way to collaborate with people). It got kinda heated at times. But eventually we came out with a largely agreed upon statement of values, and I think our individual values all changed a little along the way.

The statement of values

I am proud of what we produced, and I had a lot of fun doing it. I think it sums up my values rather well. I think it's firm and clear without being offensive or inflamatory. I'd love to know what you think of it - please let me know in the comments.

It can be seen on the blackgate foundation website or in our GitHub repositority, but I'm also reproducing it here in its current form (we may decide to change it in the future):

Statement of values

We, the members of the Blackgate Foundation, value:

Equality

  • Humanity should strive to treat and provide for all people equally regardless of appearance, sexuality, gender, beliefs, ability or actions.
  • All people should be equally represented and no person fundamentally deserves to be better off than any other.

Science & openness

  • The pursuit of knowledge is a human instinct and a universal force for good.
  • There is value in sceptical, evidence-based and objective reasoning in the persuit of knowledge.
  • Knowledge should be made available to all of humanity. We should strive to build on existing work rather than doing work from scratch.
  • There is value in open processes and collective decision making - many eyes guard against injustices and inefficiencies.

Diversity

  • Diversity is important in all things. Many opinions and diverse practices prevent stagnation, create risilience through redundancy and speed evolution and learning.
  • Centres of control should be diverse and small and subservient and answerable to all over whom they hold influence. Any decisions by such centres should be evidence based and open to discussion.
  • The interests of humanity should always come before those of any individual or group, particularly applies to corporate protectionism and nationalism.

Pacifism

  • Violence in all its forms is divisive and inflammatory and therefore always undesirable.
  • We renounce the glorification of violence and the use of violence to solve disputes.
  • It is in the interest of humanity to seek to understand and help those who act violently.

Evidence-based morality

  • Morality is not absolute. Moral guidelines should be formed through evidence-based reasoning.
  • There exist solid evidence-based arguments for the most universally accepted moral tenets.
  • "Bad" and "evil" are counter-productive concepts. Humanity should strive to avoid ultimately judging any person as either.

Sustainability

  • All human activity should continually strive to be sustainable. Notable examples are human impact on the environment and the global economy.

Try it!

Why don't you try writing down your morals and values in a similar form? Or do it with some friends? I really enjoyed it and couldn't recommend it more.

Read more

Static site generators (like Jekyll and Hyde) offer a much simpler and more transparent way to create a website. There's a small learning curve, but it's totally worth it. Especially if you're a developer already.

What is a static site generator?

A piece of software that can read a set of files in a particular format and convert them into static files (e.g. HTML &c.) that can then be served directly as a website.

Note that just because a site is static on the server-side doesn't mean it can't be dynamic on the client-side. You can easily include comments and other dynamic functionality through JavaScript plugins.

The workflow goes something like this:

$ sublime-text _posts/2013-05-30-why-i-love-the-internet.md # create a new blog post
$ jekyll --server # build the static site into my _site/2013/05/30/why-i-love-the-internet.html directory and run a test server
# check the site and my new blog post look okay
$ git add . && git commit -m 'new post: why i love the internet' # save it in version control
$ git push heroku # release the change to my live site (I use heroku)

Why bother?

Personally I think static sites make managing websites really fun.

For the right kind of project, static sites can make it so much simpler to manage a site. They remove a whole bunch of concerns that you used to have to worry about (e.g. with CMSs like Wordpress or Drupal, or frameworks like Django, Rails or Symfony):

  • caching - You can forget about server-side caching, since you're already serving static files
  • databases - You don't need a database - all the data is stored as files
  • version control - You easily keep your whole site including document changes in version control
  • easy to start - Hardly have to write any code to get started.
  • easy to maintain - Tweaking your site is more transparent and direct - you can easily view and edit the static files directly.

Which sites make sense?

Any site that needs to do anything complex on the server-side work will not be appropriate. However, any site which is basically just a collection of static information - like a blog, a brochure site, or even a news or magazine site - could work as a static site.

The other important thing is that everyone who wants to be able to edit the site needs to learn how to do it.

This needn't necessarily exclude anyone. Many static site generators use Markdown document syntax, which anyone can understand and learn. Github even has a lets you edit files directly online, which anyone with permission can use to edit the website files. Editors will have to understand the concept of version control, and understand how the site structure works, but this shared understanding will probably aid rather than hinder your project's progression.

In any case, if the only people who edit the site directly are developers then using a static site generator should come absolutely naturally.

How?

There are many static site generators out there written in many different languages:

Personally I use jekyll for my website. Originally this was because it is natively supported in Github Pages.

I'm not going to go into how to use a Jekyll in depth in this post, but I'll try to write another couple of posts soon:

  1. How to set up a basic static site with Jekyll on Github Pages
  2. How to host a Jekyll-based site on Heroku

Read more

I have many interests, but I think there are two common thread running through them all:

  • I care deeply, fundamentally about fairness and equality
  • I am very interested in complex systems

"Complex systems" sounds extremely abstract, but I think it really is the core of my academic interest. I like mapping systems in my head, seeing the nodes; seeing the ways they interact with each other. I like working out how to create elegant systems and optimal systemic solutions for solving problems.

This leads me in two directions:

  1. I love technology. Technology, along with all the problems it's trying to solve, creates and makes use of myriad systems and systemic structures. I love trying to understanding these systems.
  2. I love social systems and social science. People are complex, and there are extremely subtle and nuanced rules governing how they think interact in a social systems. I love pondering people and psychology.

Running through all the mini projects and fancies that flow from my interest in systems is my deep desire for global fairness and equality. I believe that technology has the capacity to be a great equaliser. Most people in the world don't really have a voice to influence the global power-structures, but hopefully the internet and communications technology can give them that voice.

In a nutshell, this is why I love the internet.

Read more

Chrome version 25 appears to have made a pretty serious change to how the HTML5 input date type is rendered.

Now the date type defaults to display: -webkit-inline-flex, and (this is the bad bit) if you use display: block the layout breaks:

date field layout broken

(try it yourself)

Why is this bad?

We use the date type on arena blinds, and to have more control over the layout of the input fields, they are all set to display: block. I think this is, if not "best", at least a pretty common practice.

So one day we realised our date fields looked broken in Chrome, and it was because of this issue. So my boss said:

If we can't rely on the date control not to break, we have to abandon the HTML5 date field altogether

And that's entirely fair reasoning.

Cognitive dissonance

My boss's perfectly reasonable conclusion goes against everything progressive that I've been trying to instil in my team.

Progressive enhancement is accepted best practice nowadays - to use the built-in functionality when it's there, with fall-backs for browsers that don't support it. E.g.:

if (!Modernizr.inputtypes['date']) {
    $('input[type=date]').datepicker();
}

This is a solid approach I strongly believe in. But if Chrome are going to implement breaking changes like this, I don't know what to think any more.

Chrome, you've ruined my day.

Read more