Canonical Voices

Posts tagged with 'tech'

Christian Brauner

 

containers

For a long time LXD has supported multiple storage drivers. Users could choose between zfs, btrfs, lvm, or plain directory storage pools but they could only ever use a single storage pool. A frequent feature request was to support not just a single storage pool but multiple storage pools. This way users would for example be able to maintain a zfs storage pool backed by an SSD to be used by very I/O intensive containers and another simple directory based storage pool for other containers. Luckily, this is now possible since LXD gained its own storage management API a few versions back.

Creating storage pools

A new LXD installation comes without any storage pool defined. If you run lxd init LXD will offer to create a storage pool for you. The storage pool created by lxd init will be the default storage pool on which containers are created.

asciicast

Creating further storage pools

Our client tool makes it really simple to create additional storage pools. In order to create and administer new storage pools you can use the lxc storage command. So if you wanted to create an additional btrfs storage pool on a block device /dev/sdb you would simply use lxc storage create my-btrfs btrfs source=/dev/sdb. But let’s take a look:

asciicast

Creating containers on the default storage pool

If you started from a fresh install of LXD and created a storage pool via lxd init LXD will use this pool as the default storage pool. That means if you’re doing a lxc launch images:ubuntu/xenial xen1 LXD will create a storage volume for the container’s root filesystem on this storage pool. In our examples we’ve been using my-first-zfs-pool as our default storage pool:

asciicast

Creating containers on a specific storage pool

But you can also tell lxc launch and lxc init to create a container on a specific storage pool by simply passing the -s argument. For example, if you wanted to create a new container on the my-btrfs storage pool you would do lxc launch images:ubuntu/xenial xen-on-my-btrfs -s my-btrfs:

asciicast

Creating custom storage volumes

If you need additional space for one of your containers to for example store additional data the new storage API will let you create storage volumes that can be attached to a container. This is as simple as doing lxc storage volume create my-btrfs my-custom-volume:

asciicast

Attaching custom storage volumes to containers

Of course this feature is only helpful because the storage API let’s you attach those storage volume to containers. To attach a storage volume to a container you can use lxc storage volume attach my-btrfs my-custom-volume xen1 data /opt/my/data:

asciicast

Sharing custom storage volumes between containers

By default LXD will make an attached storage volume writable by the container it is attached to. This means it will change the ownership of the storage volume to the container’s id mapping. But Storage volumes can also be attached to multiple containers at the same time. This is great for sharing data among multiple containers. However, this comes with a few restrictions. In order for a storage volume to be attached to multiple containers they must all share the same id mapping. Let’s create an additional container xen-isolated that has an isolated id mapping. This means its id mapping will be unique in this LXD instance such that no other container does have the same id mapping. Attaching the same storage volume my-custom-volume to this container will now fail:

asciicast

But let’s make xen-isolated have the same mapping as xen1 and let’s also rename it to xen2 to reflect that change. Now we can attach my-custom-volume to both xen1 and xen2 without a problem:

asciicast

Summary

The storage API is a very powerful addition to LXD. It provides a set of essential features that are helpful in dealing with a variety of problems when using containers at scale. This short introducion hopefully gave you an impression on what you can do with it. There will be more to come in the future.

Advertisements
&
&

Read more
Christian Brauner

Storage Tools

Having implemented or at least rewritten most storage backends in LXC as well as LXD has left me under the impression that most storage tools suck. Most advanced storage drivers provide a set of tools that allow userspace to administer storage without having to link against an external library. This is a huge advantage if one wants to keep the amount of external dependencies to a minimum. This is a policy to which LXC and LXD always try to adhere. One of the most crucial features such tools should provide is the ability to retrieve each property for each storage entity they allow to administer in a predictable and machine-readable way. As far as I can tell, only the ZFS and LVM tools allow one to do this. For example

zfs get -H -p -o "value" <key> <storage-entity>

will let you retrieve (nearly) all properties. The RBD and BTRFS tools lack this ability which makes them inconvenient to use at times.


Read more
Christian Brauner

lxc exec vs ssh

Recently, I’ve implemented several improvements for lxc exec. In case you didn’t know, lxc exec is LXD‘s client tool that uses the LXD client api to talk to the LXD daemon and execute any program the user might want. Here is a small example of what you can do with it:

asciicast

One of our main goals is to make lxc exec feel as similar to ssh as possible since this is the standard of running commands interactively or non-interactively remotely. Making lxc exec behave nicely was tricky.

1. Handling background tasks

A long-standing problem was certainly how to correctly handle background tasks. Here’s an asciinema illustration of the problem with a pre LXD 2.7 instance:

asciicast

What you can see there is that putting a task in the background will lead to lxc exec not being able to exit. A lot of sequences of commands can trigger this problem:

chb@conventiont|~
> lxc exec zest1 bash
root@zest1:~# yes &
y
y
y
.
.
.

Nothing would save you now. yes will simply write to stdout till the end of time as quickly as it can…
The root of the problem lies with stdout being kept open which is necessary to ensure that any data written by the process the user has started is actually read and sent back over the websocket connection we established.
As you can imagine this becomes a major annoyance when you e.g. run a shell session in which you want to run a process in the background and then quickly want to exit. Sorry, you are out of luck. Well, you were.
The first, and naive approach is obviously to simply close stdout as soon as you detect that the foreground program (e.g. the shell) has exited. Not quite as good as an idea as one might think… The problem becomes obvious when you then run quickly executing programs like:

lxc exec -- ls -al /usr/lib

where the lxc exec process (and the associated forkexec process (Don’t worry about it now. Just remember that Go + setns() are not on speaking terms…)) exits before all buffered data in stdout was read. In this case you will cause truncated output and no one wants that. After a few approaches to the problem that involved, disabling pty buffering (Wasn’t pretty I tell you that and also didn’t work predictably.) and other weird ideas I managed to solve this by employing a few poll() “tricks” (In some sense of the word “trick”.). Now you can finally run background tasks and cleanly exit. To wit:
asciicast

2. Reporting exit codes caused by signals

ssh is a wonderful tool. One thing however, I never really liked was the fact that when the command that was run by ssh received a signal ssh would always report -1 aka exit code 255. This is annoying when you’d like to have information about what signal caused the program to terminate. This is why I recently implemented the standard shell convention of reporting any signal-caused exits using the standard convention 128 + n where n is defined as the signal number that caused the executing program to exit. For example, on SIGKILL you would see 128 + SIGKILL = 137 (Calculating the exit codes for other deadly signals is left as an exercise to the reader.). So you can do:

chb@conventiont|~
> lxc exec zest1 sleep 100

Now, send SIGKILL to the executing program (Not to lxc exec itself, as SIGKILL is not forwardable.):

kill -KILL $(pidof sleep 100)

and finally retrieve the exit code for your program:

chb@conventiont|~
> echo $?
137

Voila. This obviously only works nicely when a) the exit code doesn’t breach the 8-bit wall-of-computing and b) when the executing program doesn’t use 137 to indicate success (Which would be… interesting(?).). Both arguments don’t seem too convincing to me. The former because most deadly signals should not breach the range. The latter because (i) that’s the users problem, (ii) these exit codes are actually reserved (I think.), (iii) you’d have the same problem running the program locally or otherwise.
The main advantage I see in this is the ability to report back fine-grained exit statuses for executing programs. Note, by no means can we report back all instances where the executing program was killed by a signal, e.g. when your program handles SIGTERM and exits cleanly there’s no easy way for LXD to detect this and report back that this program was killed by signal. You will simply receive success aka exit code 0.

3. Forwarding signals

This is probably the least interesting (or maybe it isn’t, no idea) but I found it quite useful. As you saw in the SIGKILL case before, I was explicit in pointing out that one must send SIGKILL to the executing program not to the lxc exec command itself. This is due to the fact that SIGKILL cannot be handled in a program. The only thing the program can do is die… like right now… this instance… sofort… (You get the idea…). But a lot of other signals SIGTERM, SIGHUP, and of course SIGUSR1 and SIGUSR2 can be handled. So when you send signals that can be handled to lxc exec instead of the executing program, newer versions of LXD will forward the signal to the executing process. This is pretty convenient in scripts and so on.

In any case, I hope you found this little lxc exec post/rant useful. Enjoy LXD it’s a crazy beautiful beast to play with. Give it a try online https://linuxcontainers.org/lxd/try-it/ and for all you developers out there: Checkout https://github.com/lxc/lxd and send us patches. </p>
            <a href=Read more

bigjools

Why?

I recently had cause to try to get federated logins working on Openstack, using Kerberos as an identity provider. I couldn’t find anything on the Internet that described this in a simple way that is understandable by a relative newbie to Openstack, so this post is attempting to do that, because it has taken me a long time to find and digest all the info scattered around. Unfortunately the actual Openstack docs are a little incoherent at the moment.

Assumptions

  • I’ve tried to get this working on older versions of Openstack but the reality is that unless you’re using Kilo or above it is going to be an uphill task, as the various parts (changes in Keystone and Horizon) don’t really come together until that release.
  • I’m only covering the case of getting this working in devstack.
  • I’m assuming you know a little about Kerberos, but not too much :)
  • I’m assuming you already have a fairly vanilla installation of Kilo devstack in a separate VM or container.
  • I use Ubuntu server. Some things will almost certainly need tweaking for other OSes.

Overview

The federated logins in Openstack work by using Apache modules to provide a remote user ID, rather than credentials in Keystone. This allows for a lot of flexibility but also provides a lot of pain points as there is a huge amount of configuration. The changes described below show how to configure Apache, Horizon and Keystone to do all of this.

Important! Follow these instructions very carefully. Kerberos is extremely fussy, and the configuration in Openstack is rather convoluted.

Pre-requisites

If you don’t already have a Kerberos server, you can install one by following https://help.ubuntu.com/community/Kerberos

The Kerberos server needs a service principal for Apache so that Apache can connect. You need to generate a keytab for Apache, and to do that you need to know the hostname for the container/VM where you are running devstack and Apache. Assuming it’s simply called ‘devstackhost':

$ kadmin -p <your admin principal>
kadmin: addprinc -randkey HTTP/devstackhost
kadmin: ktadd -k keytab.devstackhost HTTP/devstackhost

This will write a file called keytab.devstackhost, you need to copy it to your devstack host under /etc/apache2/auth/

You can test that this works with:

$ kinit -k -t /etc/apache2/auth/keytab.devstackhost HTTP/devstackhost

You may need to install the krb5-user package to get kinit. If there is no problem then the command prompt just reappears with no error. If it fails then check that you got the keytab filename right and that the principal name is correct. You can also try using kinit with a known user to see if the underlying Kerberos install is right (the realm and the key server must have been configured correctly, installing any kerberos package usually prompts to set these up).

Finally, the keytab file must be owned by www-data and read/write only by that user:

$ sudo chown www-data /etc/apache2/auth/keytab.devstackhost
$ sudo chmod 0600 /etc/apache2/auth/keytab.devstackhost

Apache Configuration

Install the Apache Kerberos module:

$ sudo apt-get install libapache2-mod-auth-kerb

Edit the /etc/apache2/sites-enabled/keystone.conf file. You need to make sure the mod_auth_kerb module is installed, and add extra Kerberos config.

LoadModule auth_kerb_module modules/mod_auth_kerb.so

<VirtualHost *:5000>

 ...

 # KERB_ID must match the IdP set in Openstack.
 SetEnv KERB_ID KERB_ID
 
 <Location ~ "kerberos" >
 AuthType Kerberos
 AuthName "Kerberos Login"
 KrbMethodNegotiate on
 KrbServiceName HTTP
 KrbSaveCredentials on
 KrbLocalUserMapping on
 KrbAuthRealms MY-REALM.COM
 Krb5Keytab /etc/apache2/auth/keytab.devstackhost
 KrbMethodK5Passwd on #optional-- if 'off' makes GSSAPI SPNEGO a requirement
 Require valid-user
 </Location>

Note:

  • Don’t forget to edit the KrbAuthRealms setting to your own realm.
  • Don’t forget to edit Krb5Keytab to match your keytab filename
  • Pretty much all browsers don’t support SPNEGO out of the box, so KrbMethodK5Passwd is enabled here which will make the browser pop up one of its own dialogs prompting for credentials (more on that later). If this is off, the browser must support SPNEGO which will fetch the Kerberos credentials from your user environment, assuming the user is already authenticated.
  • If you are using Apache 2.2 (used on Ubuntu 12.04) then KrbServiceName must be configured as HTTP/devstackhost (change devstackhost to match your own host name). This config is so that Apache uses the service principal name that we set up in the Kerberos server above.

Keystone configuration

Federation must be explicitly enabled in the keystone config.
http://docs.openstack.org/developer/keystone/extensions/federation.html explains this, but to summarise:

Edit /etc/keystone/keystone.conf and add the driver:

[federation]
driver = keystone.contrib.federation.backends.sql.Federation
trusted_dashboard = http://devstackhost/auth/websso
sso_callback_template = /etc/keystone/sso_callback_template.html

(Change “devstackhost” again)

Copy the callback template to the right place:

$ cp /opt/stack/keystone/etc/sso_callback_template.html /etc/keystone/

Enable kerberos in the auth section of /etc/keystone/keystone.conf :

[auth]
methods = external,password,token,saml2,kerberos
kerberos = keystone.auth.plugins.mapped.Mapped

Set the remote_id_attribute, which tells Openstack which IdP was used:

[kerberos]
remote_id_attribute = KERB_ID

Add the middleware to keystone-paste.conf. ‘federation_extension’ should be the second last entry in the pipeline:api_v3 entry:

[pipeline:api_v3]
pipeline = sizelimit url_normalize build_auth_context token_auth admin_token_auth json_body ec2_extension_v3 s3_extension simple_cert_extension revoke_extension federation_extension service_v3

Now we have to create the database tables for federation:

$ keystone-manage db_sync --extension federation

Openstack Configuration

Federation must use the v3 API in Keystone. Get the Openstack RC file from the API access tab of Access & Security and then source it to get the shell API credentials set up. Then:

$ export OS_AUTH_URL=http://$HOSTNAME:5000/v3
$ export OS_IDENTITY_API_VERSION=3
$ export OS_USERNAME=admin

Test this by trying something like:

$ openstack project list

Now we have to set up the mapping between remote and local users. I’m going to add a new local group and map all remote users to that group. The mapping is defined with a blob of json and it’s currently very badly documented (although if you delve into the keystone unit tests you’ll see a bunch of examples). Start by making a file called add-mapping.json:

[
    {
        "local": [
            {
                "user": {
                    "name": "{0}",
                    "domain": {"name": "Default"}
                }
            },
            {
                "group": {
                    "id": "GROUP_ID"
                    }
            }
        ],
        "remote": [
            {
                "type": "REMOTE_USER"
            }
        ]
    }
]

Now we need to add this mapping using the openstack shell.

openstack group create krbusers
openstack role add --project demo --group krbusers member
openstack identity provider create kerb group_id=`openstack group list|grep krbusers|awk '{print $2}'`
cat add-mapping.json|sed s^GROUP_ID^$group_id^ > /tmp/mapping.json
openstack mapping create --rules /tmp/mapping.json kerberos_mapping
openstack federation protocol create --identity-provider kerb --mapping kerberos_mapping kerberos
openstack identity provider set --remote-id KERB_ID kerb

(I’ve left out the command prompt so you can copy and paste this directly)

What did we just do there?

In my investigations, the part above took me the longest to figure out due to the current poor state of the docs. But basically:

  • Create a group krbusers to which all federated users will map
  • Make sure the group is in the demo project
  • Create a new identity provider which is linked to the group we just created (the API frustratingly needs the ID, not the name, hence the shell machinations)
  • Create the new mapping, then link it to a new “protocol” called kerberos which connects the mapping to the identity provider.
  • Finally, make sure the remote ID coming from Apache is linked to the identity provider. This makes sure that any requests from Apache are routed to the correct mapping. (Remember above in the Apache configuration that we set KERB_ID in the request environment? This is an arbitrary label but they need to match.)

After all this, we have a new group in Keystone called krbusers that will contain any user provided by Kerberos.

Ok, we’re nearly there! Onwards to …

Horizon Configuration

Web SSO must be enabled in Horizon. Edit the config at /opt/stack/horizon/openstack_dashboard/local/local_settings.py and make sure the following settings are set at the bottom:

WEBSSO_ENABLED = True

WEBSSO_CHOICES = (
("credentials", _("Keystone Credentials")),
("kerberos", _("Kerberos")),
)

WEBSSO_INITIAL_CHOICE="kerberos"

COMPRESS_OFFLINE=True

OPENSTACK_KEYSTONE_DEFAULT_ROLE="Member"

OPENSTACK_HOST="$HOSTNAME"

OPENSTACK_API_VERSIONS = {
"identity": 3
}

OPENSTACK_KEYSTONE_URL="http://$HOSTNAME:5000/v3"

Make sure $HOSTNAME is actually the host name for your devstack instance.

Now, restart apache

$ sudo service apache2 restart

and you should be able to test that the federation part of Keystone is working by visiting this URL

http://$HOSTNAME:5000/v3/OS-FEDERATION/identity_providers/kerb/protocols/kerberos/auth

You’ll get a load of json back if it worked OK.

You can now test the websso part of Horizon by going here:

http://$HOSTNAME:5000/v3/auth/OS-FEDERATION/websso/kerberos?origin=http://$HOSTNAME/auth/websso/

You should get a browser dialog which asks for Kerberos credentials, and if you get through this OK you’ll see the sso_callback_template returned to the browser.

Trying it out!

If you don’t have any users in your Kerberos realm, it’s easy to add one:

$ ktadmin
ktadmin: addprinc -randkey <NEW USER NAME>
ktadmin: cpw -pw <NEW PASSWORD> <NEW USER NAME>

Now visit your Openstack dashboard and you should see something like this:

kerblogin

Click “Connect” and log in and you should be all set.


Read more
mitechie

A couple of people have reached out to me via LinkedIn and reminded me that my three year work anniversary happened last Friday. Three years since I left my job at a local place to go work for the Canonical where I got the chance to be paid to work on open source software and better my Python skills with the team working on Launchpad. My wife wasn’t quite sure. “You’ve only been at your job a year and a half, and your last one was only two years. What makes this different?”

What’s amazing, looking back, is just how *right* the decision turned out to be. I was nervous at the time. I really wasn’t Launchpad’s biggest fan. However, the team I interviewed with held this promise of making me a better developer. They were doing code reviews of every branch that went up to land. They had automated testing, and they firmly believed in unit and functional tests of the code. It was a case of the product didn’t excite me, but the environment, working with smart developers from across the globe, was exactly what I felt like I needed to move forward with my career, my craft.

2013-09-02 18.17.47

I joined my team on Launchpad in a squad of four other developers. It was funny. When I joined I felt so lost. Launchpad is an amazing and huge bit of software, and I knew I was in over my head. I talked with my manager at the time, Deryck, and he told me “Don’t worry, it’ll take you about a year to get really productive working on Launchpad.” A year! Surely you jest, and if you’re not jesting…wtf did I just get myself into?

It was a long road and over time I learned how to take a code review (a really hard skill for many of us), how to do one, and how to talk with other smart and opinionated developers. I learned the value of the daily standup, how to manage work across a kanban board. I learned to really learn from others. Up until this point I’d always been the big fish in a small pond and suddenly I was the minnow hiding in the shallows. Forget books on how to code, just look at the diff in the code review you’re reading right now. Learn!

My boss was right, it was nearly ten months before I really felt like I could be asked to do most things in Launchpad and get them done in an efficient way. Soon our team was moved on from Launchpad to other projects. It was actually pretty great. On the one hand, “Hey! I just got the hang of this thing” but, on the other hand, we were moving on to new things. Development life here has never been one of sitting still. We sit down and work on the Ubuntu cycle of six month plans, and it’s funny because even that is such a long time. Do you really know what you’ll be doing six months from now?

P1100197.jpg

Since that time in Launchpad I’ve gotten work on several different projects and I ended up switching teams to work on the Juju Gui. I didn’t really know a lot about this Juju thing, but the Gui was a fascinating project. It’s a really large scale JavaScript application. This is no “toss some jQuery on a web page” thing here.

I also moved to work under a new manager Gary. As my second manager since starting at Canonical and I was amazed at my luck. Here I’ve had two great mentors that made huge strides in teaching me how to work with other developers, how to do the fun stuff, the mundane, and how to take pride in the accomplishments of the team. I sit down at my computer every day and I’ve got the brain power of amazing people at my disposal over irc, Google Hangouts, email, and more. It’s amazing to think that at these sprints we do, I’m pretty much never the smartest person in the room. However, that’s what’s so great. It’s never boring and when there’s a problem the key is that we put our joint brilliant minds to the problem. In every hard problem we’ve faced I’ve never found that a single person had the one true solution. What we come up with together is always better than what any of us had apart.

When Gary left and there was a void for team lead and it was something I was interested in. I really can’t say enough awesome things about the team of folks I work with. I wanted to keep us all together and I felt like it would be great for us to try to keep things going. It was kind of a “well I’ll just try not to $#@$@# it up” situation. That was more than nine months ago now. Gary and Deryck taught me so much, and I still have to bite my tongue and ask myself “What would Gary do” at times. I’ve kept some things the same, but I’ve also brought my own flavor into the team a bit, at least I like to think so. These days my Github profile doesn’t show me landing a branch a day, but I take great pride in the progress of the team as a whole each and every week.

The team I run now is as awesome a group of people, the best I could hope to work for. I do mean that, I work for my team. It’s never the other way around and that’s one lesson I definitely picked up from my previous leads. The projects we’re working on are exciting and new and are really important to Canonical. I get to sit in and have discussions and planning meetings with Canonical super genius veterans like Kapil, Gustavo, and occasionally Mark Shuttleworth himself.

Looking back I’ve spent the last three years becoming a better developer, getting an on the job training course on leading a team of brilliant people, and crash course on thinking about the project, not just as the bugs or features for the week, but for the project as it needs to exist in three to six months. I’ve spent three years bouncing between “what have I gotten myself into, this is beyond my abilities” to “I’ve got this. You can’t find someone else to do this better”. I always tell people that if you’re not swimming as hard as you can to keep up, find another job. I feel like three years ago I did that and I’ve been swimming ever since.

P1040511.jpg

Three years is a long time in a career these days. It’s been a wild ride and I can’t thank the folks that let me in the door, taught me, and have given me the power to do great things with my work enough. I’ve worked by butt off in Budapest, Copenhagen, Cape Town, Brussels, North Carolina, London, Vegas, and the bay area a few times. Will I be here three years from now? Who knows, but I know I’ve got an awesome team to work with on Monday and we’ll be building an awesome product to keep building. I’m going to really enjoy doing work that’s challenging and fulfilling every step of the way.

DSC00329


Read more
bigjools

New MAAS features in 1.7.0

MAAS 1.7.0 is close to its release date, which is set to coincide with Ubuntu 14.10’s release.

The development team has been hard at work and knocked out some amazing new features and improvements. Let me take you through some of them!

UI-based boot image imports

Previously, MAAS used to require admins to configure (well, hand-hack) a yaml file on each cluster controller that specified precisely which OSes, release and architectures to import. This has all been replaced with a very smooth new API that lets you simply click and go.

New image import configuration page

Click for bigger version

The different images available are driven by a “simplestreams” data feed maintained by Canonical. What you see here is a representation of what’s available and supported.

Any previously-imported images also show on this page, and you can see how much space they are taking up, and how many nodes got deployed using each image. All the imported images are automatically synced across the cluster controllers.

image-import

Once a new selection is clicked, “Apply changes” kicks off the import. You can see that the progress is tracked right here.

(There’s a little more work left for us to do to track the percentage downloaded.)

Robustness and event logs

MAAS now monitors nodes as they are deploying and lets you know exactly what’s going on by showing you an event log that contains all the important events during the deployment cycle.

node-start-log

You can see here that this node has been allocated to a user and started up.

Previously, MAAS would have said “okay, over to you, I don’t care any more” at this point, which was pretty useless when things start going wrong (and it’s not just hardware that goes wrong, preseeds often fail).

So now, the node’s status shows “Deploying” and you can see the new event log at the bottom of the node page that shows these actions starting to take place.

After a while, more events arrive and are logged:

node-start-log2

And eventually it’s completely deployed and ready to use:

node-start-log3

You’ll notice how quick this process is nowadays.  Awesome!

More network support

MAAS has nascent support for tracking networks/subnets and attached devices. Changes in this release add a couple of neat things: Cluster interfaces automatically have their networks registered in the Networks tab (“master-eth0″ in the image), and any node network interfaces known to be attached to any of these networks are automatically linked (see the “attached nodes” column).  This makes even less work for admins to set up things, and easier for users to rely on networking constraints when allocating nodes over the API.

networks

Power monitoring

MAAS is now tracking whether the power is applied or not to your nodes, right in the node listing.  Black means off, green means on, and red means there was an error trying to find out.

powermon

Bugs squashed!

With well over 100 bugs squashed, this will be a well-received release.  I’ll post again when it’s out.


Read more
bigjools

While setting up my new NUCs to use with MAAS as a development deployment tool, I got very, very frustrated with the initial experience so I thought I’d write up some key things here so that others may benefit — especially if you are using MAAS.

First hurdle — when you hit ctrl-P at the boot screen it is likely to not work. This is because you need to disable the num lock.

Second hurdle — when you go and enable the AMT features it asks for a new password, but doesn’t tell you that it needs to contain upper case, lower case, numbers AND punctuation.

Third hurdle — if you want to use it headless like me, it’s a good idea to enable the VNC server.  You can do that with this script:

AMT_PASSWORD=<fill me in>
VNC_PASSWORD=<fill me in>
IP=N.N.N.N
wsman put http://intel.com/wbem/wscim/1/ips-schema/1/IPS_KVMRedirectionSettingData -h ${IP} -P 16992 -u admin -p ${AMT_PASSWORD} -k RFBPassword=${VNC_PASSWORD} &&\
wsman put http://intel.com/wbem/wscim/1/ips-schema/1/IPS_KVMRedirectionSettingData -h ${IP} -P 16992 -u admin -p ${AMT_PASSWORD} -k Is5900PortEnabled=true &&\
wsman put http://intel.com/wbem/wscim/1/ips-schema/1/IPS_KVMRedirectionSettingData -h ${IP} -P 16992 -u admin -p ${AMT_PASSWORD} -k OptInPolicy=false &&\
wsman put http://intel.com/wbem/wscim/1/ips-schema/1/IPS_KVMRedirectionSettingData -h ${IP} -P 16992 -u admin -p ${AMT_PASSWORD} -k SessionTimeout=0 &&\
wsman invoke -a RequestStateChange http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_KVMRedirectionSAP -h ${IP} -P 16992 -u admin -p ${AMT_PASSWORD} -k RequestedState=2

(wsman comes from the wsmancli package)

But there is yet another gotcha!  The VNC_PASSWORD must be no more than 8 characters and still meet the same requirements as the AMT password.

Once this is all done you should be all set to use this very fast machine with MAAS.


Read more

At UDS last week there was another "Testing in Ubuntu" session. During the event I gave a brief presentation on monitoring and testability. The thesis was that there are a lot of parallels between monitoring and testing, so many that it's worth thinking of monitoring as a type of testing at times. Due to that great monitoring requires a testable system, as well as thinking about monitoring right at the start to build a monitorable system as well as a testable one.

You can watch a video of the talk here. (Thanks to the video team for recording it and getting it online quickly.)

I have two main questions. Firstly, what are the conventional names for the "passive" and "active" monitoring that I describe? Seecondly, do you agree with me about monitoring?

Read more

At UDS last week there was another "Testing in Ubuntu" session. During the event I gave a brief presentation on monitoring and testability. The thesis was that there are a lot of parallels between monitoring and testing, so many that it's worth thinking of monitoring as a type of testing at times. Due to that great monitoring requires a testable system, as well as thinking about monitoring right at the start to build a monitorable system as well as a testable one.

You can watch a video of the talk here. (Thanks to the video team for recording it and getting it online quickly.)

I have two main questions. Firstly, what are the conventional names for the "passive" and "active" monitoring that I describe? Seecondly, do you agree with me about monitoring?

Read more
rvr

Fernando Tricas always has interesting things to say. In a recent post he talks about The life of links and digital content (Spanish):

«We tend to assume that digital [content] is forever. But anyone who accumulates enough information also knows that sometimes its difficult to find it, in other cases it breaks and, of course, there is a non-zero probability that things go wrong when hosted by third-party services. It is an old topic here, remember Will we have all this information in the future? . The topic resurfaces as news in the light of Currently charged by the article that can be read at A Year After the Egyptian Revolution, 10% of Its Social Media Documentation Is Already Gone».

In the comments, Anónima said: «Given a time t and an interval Δt, the larger Δt, the more likely is that all information in a time t-Δt you want to find is gone». This sounded like an statement to check, Thus, I decided to do an experiment with del.icio.us' bookmarks.

In delicious.com/rvr I have archived around 4000 links from 2004. So, I downloaded the backup file, an HTML file with all links and metadata (date, title, tags). I developed a python script to process this file: go through the links and save its current status (whether the link is alive or not). With another script, the status were processed to generate the statistics. These are the results:

Captura de pantalla 2012-04-12 a la(s) 01.02.39

As can be seen, there is a correlation between the age of the links and the probability of being dead. For the 10% who cited the Egyptian revolution, in the case of my delicious, we must go back three years ago (2009). But at 6 years from now, a quarter of the links are now defunct. Of course, the sample is very small shouldn't be representative. It would be interesting to compare it with other accounts and to extend the time span: How many links are still alive after 10 or 15 years? Is it the same with information stored in other media? Are all this death links resting in peace in a forgotten Google's cache disk?

I imagine that sometime in the future, librarians will begin to worry not only to digitize remote past documents, but also to preserve those of the present.

In case you are interested, the code to generate such data is available at github.com/vrruiz/delicious-death-links. The spreadsheet is also available in Google Docs .

Read more
mitechie

Since joining the Launchpad team my email has been flooded. I’ve always been pretty careful to keep my email clean and I’ve been a bit overwhelmed with all the new mailing lists. There are a bunch of people working on things, as you can imagine. So the email never stops. I’m still working on figuring out what I need to know, what I can ignore, and what should be filed away for later.

Another thing I’m finding is that I’ve got emails in both of my accounts around a single topic. For instance, I have to do some traveling. I’ve got emails on both my Gmail (personal) and Canonical (work) accounts that I really want to keep together in a single travel bucket.

I currently have offlineimap pull both of work and personal accounts down into a single folder on my machine ~/.email/. So I’ve got a ~/.email/work and a ~/.email/personal. I use mutt then to open the root there and to work through email. It works pretty well. Since I really wanted a global "travel" folder, I figured I’d just created one. So that works. I end up with a directory structure like:

  • personal
  • travel
  • work

The problem

Of course the issue here is that when offlineimap runs again it sees the email is no longer in the personal or work accounts and removes them from the server. And the travel folder isn’t a part of any server side account so it’s not backed up or synced anywhere. This means Gmail no longer sees things, my phone no longer sees them, and I’ve got no backups. Oops!

Solution start

So to fix that, my new directory structure needs to become an account. So I setup dovecot on my colo server. This way I could have an imap account that I could do whatever with. To get my email into there, I setup offlineimap on my colo to pull personal and work down as I had on my laptop. So I still have things in a ~/.email that’s from the accounts and then dovecot is keeping all of my email in ~/email (not a hidden dir). To get my email into there, I symlinked the ~/.email/personal/INBOX to ~/email/personal and did the same with the work account. Now the two accounts are just extra folders in my dovecot setup.

So there we go, colo is pulling my email, and I changed my laptop to offlineimap sync with the new dovecot server. In this way, I’ve got a single combined email account on my laptop using mutt. I then also setup my phone with an imap client to talk directly to the dovecot server. Sweet, this is getting closer to what I really want.

Issues start, who am I

Of course, once this started working I realized I had to find a way to make sure I sent email as the right person. I’d previously just told mutt if I was in the personal account to use that address and if in the work account use that one. Fortunately, we can help make mutt a bit more intelligent about things.

First, we want to have mutt check the To/CC headers to determine who this email was to, if it was me, then use that address as a From during replies.

mutt config:

# I have to set these defaults because when you first startup mutt
# it's not running folder hooks. It just starts in a folder
set from="rharding@mitechie.com"
# Reply with the address used in the TO/CC header
set reverse_name=yes
alternates "rick.stuff@canonical.com|deuce868@gmail.com"

This is a start, but it fails when sending new email. It’s not sure who I should be still. So I want a way to manually switch who the active From use is. These macros give me the ability to swap using the keybindings Alt-1 and Alt-2.

mutt config:

macro index,pager \e1 ":set from=rharding@mitechie.com\n:set status_format=\"-%r-rharding@mitechie.com: %f [Msgs:%?M?%M/?%m%?n? New:%n?%?o? Old:%o?%?d? Del:%d?%?F? Flag:%F?%?t? Tag:%t?%?p? Post:%p?%?b? Inc:%b?%?l? %l?]---(%s/%S)-%>-(%P)---\"\n" "Switch to rharding@mitechie.com"
macro index,pager \e2 ":set from=rick.stuff@canonical.com\n:set status_format=\"-%r-rick.stuff@canonical.com: %f [Msgs:%?M?%M/?%m%?n? New:%n?%?o? Old:%o?%?d? Del:%d?%?F? Flag:%F?%?t? Tag:%t?%?p? Post:%p?%?b? Inc:%b?%?l? %l?]---(%s/%S)-%>-(%P)---\"\n" "Switch to rick.stuff@canonical.com"

That’s kind of cool, and it shows in the top of my window who I am set to. Hmm, even that fails if I’ve started an email and want to switch who I am on the fly. There is a way to change that though, so another macro to the rescue, this time for the compose ui in mutt.

mutt config:

macro compose \e1 "<esc>f ^U Rick Harding <rharding@mitechie.com>\n"
macro compose \e2 "<esc>f ^U Rick Harding <rick.stuff@canonical.com>\n"

There, now even if I’m in the middle of creating an email I can switch who it’s sent as. It’s not perfect, and I know I’ll screw up at some point, but hopefully this is close enough.

Firming up with folder hooks

Finally, if I know the folder I’m in is ONLY for one account or the other, I can use folder hooks to fix that up for me.

mutt config:

folder-hook +personal.* set from="rharding@mitechie.com"
folder-hook +personal.* set signature=$HOME/.mutt/signature-mitechie
folder-hook +personal.* set query_command='"goobook query \'%s\'"'

So there, if I’m in my personal account, set the from, the signature, and change mutt to complete my addresses from goobook instead of the ldap completion I use for work addresses.

Not all roses

There are still a few issues. I lose webmail. After all, mail goes into my Gmail Inbox and then from there into various folders of my dovecot server. Honestly though, I don’t think this will be an issue. I tend to use my phone more and more for email management so as long as that works, I can get at things.

I also lose Gmail search for a large portion of my email. Again, it’s not killer. On my laptop I’ve been using notmuch (Xapian backed) for fulltext search and it’s been doing a pretty good job for me. However, I can’t run that on my phone. So searching for mail on there is going to get harder. Hopefully having a decent folder structure will help though.

I’ve also noticed that the K-9 mail client is a bit flaky with syncing changes up on things. Gmail, mutt, and I’ve also setup Thunderbird all seem to sync up ok without issue, so I think this is K-9 specific.

That brings up the issue of creating new folders. Offlineimap won’t pick up new folders I create from within mutt. It won’t push those up as new imap folders for some reason. I have to first create them using thunderbird, which sets up the folder server side for me. Then everything works ok. It’s a PITA, but hopefully I can find a better way to do this. Maybe even a Python script to hook into a mutt macro or something.

Wrap Up

So there we are. Next up is to setup imapfilter to help me pre-filter the email as it comes in. Now that all email is in one place that should be nice and easy. I can run that on my colo server and it’ll be quick.

This is obviously more trouble than most people want to go through to setup email, but hey, maybe someone will find this interesting or have some of their own ideas to share.


Read more

Today is the first day of the Linaro Connect in Cambridge. Linaro has gathered to spend a week talking, coding and having fun.

The Infrastructure team is spending most of the week coding, on a few select topics, chosen to make good use of the time that we have together.

In order to help us focus on our goals for the week I've put together a hard copy version of status.linaro.org.

/images/connect-progress-start.jpg

We'll be updating it during the week as we make progress. I'll report back on how it looks at the end of the week.

Read more

Today is the first day of the Linaro Connect in Cambridge. Linaro has gathered to spend a week talking, coding and having fun.

The Infrastructure team is spending most of the week coding, on a few select topics, chosen to make good use of the time that we have together.

In order to help us focus on our goals for the week I've put together a hard copy version of status.linaro.org.

/images/connect-progress-start.jpg

We'll be updating it during the week as we make progress. I'll report back on how it looks at the end of the week.

Read more
rvr

Summary

  • Brief analysis of 150,000 photographs from Flickr in the province of Malaga.
  • It identifies the profile and preferences of tourists.

Last Saturday, I  was in Malaga. I was invited by Sonia Blanco and the Universidad Internacional de Andalucia to participate in workshop on Tourism and Social Networks. Sonia is professor at the University of Malaga, and one of the oldest bloggers in the Spanish blogosphere. Sonia asked me to present the analysis Fernando Tricas and myself did about Flickr photos and the Canary Islands (2009-2010), and I gladly accepted. I wanted to bring an update, so we got to work to make a short presentation with data from the province of Malaga. And that's what is shown below.

Video

Last Thursday, with the presentation already made, Fernando passed me an interesting link, a visualization by the Wall Street Journal that shows the density of a week of Foursquare check-ins in New York . If the WSJ could do it, so do we ;)  We already had the data and the map algorithms, so generated the maps by months and joined them to build the animation.

The video below shows the density of photographs taken in the province of Malaga from 2004 to 2010. Blue colors are areas where they make some pictures, and the red areas have made many pictures. There are areas with many photographs, places of touristic interest. And of course, there are months where the activity is higher and lower. 

Data

The video is just a bit of whole presented analysis. Full version is available below.

As you may know, Flickr is a popular photo-sharing service with 5 billion of hosted images and 86 million unique visitors. Flickr has social networking features, since it allows to make contacts. Flickr can play a role in the promotion of tourist destinations, as it is one of the main sources of images on the Internet. But to us, Flickr is a huge source of data: Which are the most photogenic places? Who are taking pictures there? These and other questions can answered using data mining.

For this study we obtained the metadata of 175,000 photographs (62,000 geolocated), 7,900 photographers and 1,470,000 tags (47,000 unique). All these pictures were either marked by the tag "malaga" or GPS coordinates were inside the province of Malaga.

Analysis

Below are the five most relevant slides: the tag cloud, the number of photos and photographers by months, the top 10 countries of the geolocated photographers, the group of tags and heatmaps of the geolocated images.

  • Turismo-malaga-11
  • Turismo-malaga-17
  • Turismo-malaga-13
  • Turismo-malaga-15
  • Turismo-malaga-20
  • Turismo-malaga-20
Turismo-malaga-20

According to those who share photos on Flickr about Malaga, we can conclude that:

  • The high season in Málaga is August (also, in April there is a Holy Week-effect.
  • Users come mainly from UK, USA, Italy, Germany, Madrid and Andalusia. (USA is probably overrepresented compared to real visitors).
  • They are interested in photography, beaches, festivals, fairs, nature, sea, birds, sky, parks.
  • Pictures are taken mainly in Málaga (capital), Ronda, Barcenilla and Benalmadena.

The full presentation slides show more features, such as geolocated photographs by countries. It is interesting to compare these data with the previous study on the Canaries. A more detailed analysis can be done, but the roundtable had limited time. This sneak peek shows the potential of social networking and geolocation services for market research. If you have any questions, ask in the comments!

The presentation and images have a Creative Commons Attribution-Share Alike license.

Finally, my gratitude to the organization of the UNIA for the invitation and hospitality, to Daniel Cerdan for suggesting the title of the post and Fernando Tricas for his unconditional support.

Read more

If you are an application developer and you want to distribute your new application for a linux distribution, then you currently have several hurdles in your path. Beyond picking which one to start with, you either have a learn a packaging format well enough that you can do the work yourself, or find someone that can do it for you.

At the early stages though neither of these options is particularly compelling. You don't want to learn a packaging format, as there is lots of code to write, and that's what you want to focus on. Finding someone to do the work for you would be great, but there are far more applications than skilled packagers, and convincing someone to help you with something larval is tough: there are going to be a lot of updates, with plenty of churn, to stay on top of, and it may be too early for them to tell if the application will be any good.

This is where pkgme comes in. This is a tool that can take care of the packaging for you, so that you can focus on writing the code, and skilled packagers can focus on packages that need high-quality packaging as they will have lots of users.

This isn't a new idea, and there are plenty of tools out there to generate the packaging for e.g. a Python application. I don't think it is a particularly good use of developer time to produce tools like that for every language/project type out there.

Instead, a few of us created pkgme. This is a tool in two parts. The first part knows about packaging, and how to create the necessary files to build a working package, but it doesn't know anything about your application. This knowledge is delegated to a backend, which doesn't need to understand packaging, and just needs to be able to tell pkgme certain facts about the application.

pkgme is now at a stage where we would like to work with people to develop backends for whatever application type you would like (Python/ Ruby On Rails/GNOME/KDE/CMake/Autotools/Vala etc.) You don't have to be an expert on packaging, or indeed on the project type you want to work on. All it takes is writing a few scripts (in whatever language makes sense), which can introspect an application and report things such as the name, version, dependencies, etc.

If this sounds like something that you would like to do then please take a look at the documentation, write the scripts, and then submit your backend for inclusion in pkgme.

You can also contact the developers, see the nascent website at pkgme.net, or visit the Launchpad page. (We are also very interested in help with the website and documentation if that is where you skills or interests lie.)

Read more

If you are an application developer and you want to distribute your new application for a linux distribution, then you currently have several hurdles in your path. Beyond picking which one to start with, you either have a learn a packaging format well enough that you can do the work yourself, or find someone that can do it for you.

At the early stages though neither of these options is particularly compelling. You don't want to learn a packaging format, as there is lots of code to write, and that's what you want to focus on. Finding someone to do the work for you would be great, but there are far more applications than skilled packagers, and convincing someone to help you with something larval is tough: there are going to be a lot of updates, with plenty of churn, to stay on top of, and it may be too early for them to tell if the application will be any good.

This is where pkgme comes in. This is a tool that can take care of the packaging for you, so that you can focus on writing the code, and skilled packagers can focus on packages that need high-quality packaging as they will have lots of users.

This isn't a new idea, and there are plenty of tools out there to generate the packaging for e.g. a Python application. I don't think it is a particularly good use of developer time to produce tools like that for every language/project type out there.

Instead, a few of us created pkgme. This is a tool in two parts. The first part knows about packaging, and how to create the necessary files to build a working package, but it doesn't know anything about your application. This knowledge is delegated to a backend, which doesn't need to understand packaging, and just needs to be able to tell pkgme certain facts about the application.

pkgme is now at a stage where we would like to work with people to develop backends for whatever application type you would like (Python/ Ruby On Rails/GNOME/KDE/CMake/Autotools/Vala etc.) You don't have to be an expert on packaging, or indeed on the project type you want to work on. All it takes is writing a few scripts (in whatever language makes sense), which can introspect an application and report things such as the name, version, dependencies, etc.

If this sounds like something that you would like to do then please take a look at the documentation, write the scripts, and then submit your backend for inclusion in pkgme.

You can also contact the developers, see the nascent website at pkgme.net, or visit the Launchpad page. (We are also very interested in help with the website and documentation if that is where you skills or interests lie.)

Read more
rvr

Abstract.

  • The Cablegate set is composed of +250,000 diplomatic cables.
  • The total number sent by Embassies and Secretary of State is guessed.

One of the biggest mysteries in astrophysics is the dark matter. Dark matter can not be seen, it doesn't shine nor reflects light. But we infer its existence because dark matter weights, and modifies the path of stars and galaxies. Cablegate has its own dark matter.

According to WikiLeaks, 251,287 communications compose the Cablegate. But what is the real volume of cables between the Embassies and Secretary of State? Can we guess it? The answer is yes, there is a simple way to know it. Using the methodology explained below, the total number of communications between Embassies and the Secretary of State is guessed.

This are the results.

The dark matter of the Embassies.

20101224cablegate-darkmatter.001Between 2005-2009, more than 400,000 non leaked cables are identified. In this case, the uncertainty is larger than with just one embassy due to the small number or released cables. The sum increased by 50% in just one week.

Curiously, the average size of the 1800 published cables is 12 KB. If this average is representative of the whole set, something I doubt, the total size of the 250,000 messages would be 350 MB.

Secretary of State.

In addition to embassies' communications, Cablegate has some cables from the Secretary of State. This messages are often quite interesting, because they request information or send commands to the embassies (eg 09STATE106750).

20101224cablegate-darkmatter.002In 2005 and 2006 there is no released cable, and therefore the sum cannot be estimated. But between 2007 and 2009, the volume of cables sent by the Secretary of State is remarkable (so big, that I doubted that the record number was an ordinal number and not a more sophisticated identifier). Compare this graph with the one of the embassies. 2007 show more cables from the Secretary than all Embassies combined, but beware, because this trend can be reversed with better data.

This results are available in Google Docs.

Madrid Embassy.

This is the chart for Madrid Embassy, which ranks seventh in the number of leaked cables.

20101224cablegate-darkmatter.003Between 2004-2009, the existence of at least 17,000 dispatches sent from Madrid can be deduced. In the same period, there are just 3500 leaked cables. The graph shows the breakdown by year. 2007 is leaked in a high percentage, the oppositat in 2004 and 2005. Also, the number of communications decreases progressively (Why? Maybe other networks are used instead of SIPRNet). The complete table is available in Google Docs.

Cablegate Dark Matter Howto

The Guardian published a text file with dates, source and tags of the 250,000 diplomatic cables included in the Cablegate. The content of this messages are being slowly released. (Using this short descriptions, I did an analysis of the messages related to Spain -tagged as SP-, and suggested the existence of communications related to the 2004 Madrid bombings and the Spaniard Internet Law. Later, El País published this cables, confirming the suspicions).

To infer the volume of communications the methodology is quite simple. Each cable has an identifier. For example, 04MADRID893 summaries the Madrid bombing on March 11th, 2004. This identifier can be broken into three parts:

  • 04: Current year (2004).
  • MADRID: Origin (the Embassy in Madrid)
  • 893: Record number?

What's that record number? Let's investigate. There are some cables sent on December 2004 from Madrid Embassy, as 04MADRID4887 (dated December 29, 2004). Its record number is "4887". Another message sent on February has ID 04MADRID527, record number "527". Looking to others cables dated on January, seems obvious that the record number starts at 1 and goes up, one by one, through the year. The record number is a simple ordinal value. Thanks to this simple rule, and reading the last cables of Madrid Embassy on December 2004, we know it sent ~4900 cables that year alone.

Ideally, the last cable of the year from each Embassy would be available, but the Cablegate data is not complete. Just fraction of the leaked messages has been published so far and those last cables of the year may not be leaked in Cablegate anyway. But, as can be seen in the graphics, this method allows to do an approximation.

The code used for the calculations is available at github (cablegate-sp) and has a BSD license.

Out of sight, out of mind.

One month after the first cable release, only two thousand messages has been published. At this rate it will take a decade to release all Cablegate content. Maybe not all messages are as relevant as those released so far, eg boring messages about visas. But if WikiLeaks has raised such a stir with just 2000 cables, I cannot imagine which other secrets remain in those thousands unfiltered (although top-secret cables use other networks).

Anyway, I'm sure there is still a lot of data mining job to do with the cables.

(Spanish version of this article: Cablegate: Lo que no está en WikiLeaks).

PS (December 30th, 2010): Ricardo Estalmán linked to this entry on Wikipedia about the German tank problem during World War II:

«Suppose one is an Allied intelligence analyst during World War II, and one has some serial numbers of captured German tanks. Further, assume that the tanks are numbered sequentially from 1 to N. How does one estimate the total number of tanks?»

The Cablegate case is quite similar. I will update the estimation with the formula cited in the above article, as soon as possible (Xmas days!).

Read more

This was the confusing part when I first ran couchapp to create a new app, I couldn't really see where the "entry point" of the app was. In the hope that it might help someone else I'm going to present a quick overview of the default setup.

index.html

The index.html page is a static attachement, and the user starts by requesting it with their browser.

It has some small amount of static HTML, part of which creates a div for the javascript to put the data in.

Either inline, or in an included file, there is a small bit of javascript that will initialise the couchapp.

By default this will use the div with the id items, and will attach an evently widget to it.

evently

The evently widget that is attached will then either have an _init event, or a _changes event, either of which will be immediately run by evently.

This event will usually make a couchdb query to get data to transform to HTML and present to the user (see part three for how this works.)

Once that data has been displayed the user any combination of evently widgets or javascript can be used to make further queries and build an app that works however you like.

Previous installments

See part one, part two, and part three.

Read more

This was the confusing part when I first ran couchapp to create a new app, I couldn't really see where the "entry point" of the app was. In the hope that it might help someone else I'm going to present a quick overview of the default setup.

index.html

The index.html page is a static attachement, and the user starts by requesting it with their browser.

It has some small amount of static HTML, part of which creates a div for the javascript to put the data in.

Either inline, or in an included file, there is a small bit of javascript that will initialise the couchapp.

By default this will use the div with the id items, and will attach an evently widget to it.

evently

The evently widget that is attached will then either have an _init event, or a _changes event, either of which will be immediately run by evently.

This event will usually make a couchdb query to get data to transform to HTML and present to the user (see part three for how this works.)

Once that data has been displayed the user any combination of evently widgets or javascript can be used to make further queries and build an app that works however you like.

Previous installments

See part one, part two, and part three.

Read more

Introducing soupmatchers

jml just announced testtools 0.9.8 and in it mentioned the soupmatchers project that I started. Given that I haven't talked about it here before, I wanted to do a post to introduce it, and explain some of the rationale behind it.

soupmatchers is a library for unit testing HTML, allowing you to assert that certain things are present or not within an HTML string. Asserting this based on substring matching is going to be too fragile to be usable, and so soupmatchers works on a parsed representation of the HTML. It uses the wonderful BeautifulSoup library for parsing the HTML, and allows you to assert the presence or not of tags based on the attributes that you care about.

self.assertThat(some_html,
                HTMLContains(Tag('testtools link', 'a',
                attrs={'href': 'https://launchpad.net/testtools'})))

You can see more examples in the README.

Basing this on the testtools matchers frameworks allows you to do this in a semi-declarative way. I think there is a lot of potential here to improve your unit tests. For instance you can start to build a suite of matchers tailored to talking about the HTML that your application outputs. You can have matchers that match areas of the page, and then talk about other elements relative to them ("This link is placed within the sidebar"). One thing that particularly interests me is to create a class hierarchy that allows you test particular things across your application. For instance, you could have an ExternalLink class that asserts that a particular class is set on all of your external links. Assuming that you use this at the appropriate places in your tests then you will know that the style that is applied to class will be on all external links. Should you wish to change the way that external links are represented in the HTML you can change the one class and your tests should tell you all the places that the code has to be updated.

Please go ahead and try the library and let me know how it could be improved.

Read more