Canonical Voices

Posts tagged with 'politics'

Michael Hall

When the topic of contributions to FOSS come up, it usually happens that people focus entirely on the aspect of creation, specifically code creation, to the exclusion of all others.  In the context of software, this makes a certain amount of sense, since the primary product is the code itself, either in source or binary form.  Even the more broadly-focused, who make a point to expand their definition to include things like documentation and artwork, will still focus exclusively on the creation of those works.  And yet perhaps the single biggest factor towards increased creation of code is in the distribution of what is being created.

There are a number of reasons for people to write new code.  We often talk about a developer “scratching their own itch”, but other times it can be a matter of personal improvement, monetary gain, or even just plain fun.  While there are many reasons to write code, there are not so many reasons for releasing it under a Free or Open Source license.  By choosing such a license, the author explicitly wants his or her creation to be used by others, as many others as possible in fact.  The use of their creation is what motivates them, and it stands to reason that the more it is used, the more motivating it becomes to create.  The underlying reason why this is motivating can vary, but the fact is that creators of FOSS are motivated by the use of FOSS, and the more users there are, the more motivation there will be for creating it.

The number and variety of potential consumers of FOSS is larger than any single developer can hope to reach.  Even a group of developers, even a large group of them, will find it impossible to make their creations available to the widest possible audiences.  And the more effort they put into making their creation available, the less time and resources they have to put back into creating new things.  Likewise the smaller the pool of potential consumers, the less reason developers have to improve on or create something in the first place.  But by choosing an open source license, developers separate the work of distribution from that of creation.  The desire for their creation, then, will naturally lead to a much larger number of individuals and groups bringing these creations to the people who want them.  More importantly, by focusing exclusively on the task of distributing, these new groups are able to afford not just one project, but a multitude of projects, with an increase in the consumption of their creation.  And with an increase in consumption, it is reasonable to expect an increase in contributions.

The default application selection for each Ubuntu release is often the subject of much discussion and advocacy.  People called for the inclusion of Banshee long before Ubuntu made the switch.  It’s unimaginable that people who like a project and appreciate it’s developers would actively seek to have it used by an organization that contributed nothing back.  Likewise when it was announced that Ubuntu would switch back to Rhythmbox, those same advocates genuinely believed that they had lost something, again something unimaginable if they weren’t gaining something valuable from the distribution.  When PiTiVi was selected as a default application, advocates for Openshot made a very strong case for why their preferred application should be included, again because they knew that the project would gain something of value from the increased distribution.  The same happened with F-Spot and Shotwell, with the removal of the Gimp, the various boot splash systems, and more.  I can only assume that the same happens in other distributions.  The only reason why this would happen is if, whether consciously or not, people see a real value, as real as the value of code contributions, in being distributed as widely as possible.

By relieving the developers of the need to put resources into distribution, distributors allow them to create more using the same commitment of time and resources.  Likewise, by increasing the number of people who will be using it, the distributors multiply the motivating value, whatever it may be, that the developer gets in return.  And as the motivation for creating increases, the number of people who participate in creating also increases.  In this way, every distributor of Free and Open Source software contributes towards increasing the total number of creators and creations (including lines of code written), and they do so in direct proportion to the expansiveness of their distribution.

Read more
rvr

Abstract.

  • The Cablegate set is composed of +250,000 diplomatic cables.
  • The total number sent by Embassies and Secretary of State is guessed.

One of the biggest mysteries in astrophysics is the dark matter. Dark matter can not be seen, it doesn't shine nor reflects light. But we infer its existence because dark matter weights, and modifies the path of stars and galaxies. Cablegate has its own dark matter.

According to WikiLeaks, 251,287 communications compose the Cablegate. But what is the real volume of cables between the Embassies and Secretary of State? Can we guess it? The answer is yes, there is a simple way to know it. Using the methodology explained below, the total number of communications between Embassies and the Secretary of State is guessed.

This are the results.

The dark matter of the Embassies.

20101224cablegate-darkmatter.001Between 2005-2009, more than 400,000 non leaked cables are identified. In this case, the uncertainty is larger than with just one embassy due to the small number or released cables. The sum increased by 50% in just one week.

Curiously, the average size of the 1800 published cables is 12 KB. If this average is representative of the whole set, something I doubt, the total size of the 250,000 messages would be 350 MB.

Secretary of State.

In addition to embassies' communications, Cablegate has some cables from the Secretary of State. This messages are often quite interesting, because they request information or send commands to the embassies (eg 09STATE106750).

20101224cablegate-darkmatter.002In 2005 and 2006 there is no released cable, and therefore the sum cannot be estimated. But between 2007 and 2009, the volume of cables sent by the Secretary of State is remarkable (so big, that I doubted that the record number was an ordinal number and not a more sophisticated identifier). Compare this graph with the one of the embassies. 2007 show more cables from the Secretary than all Embassies combined, but beware, because this trend can be reversed with better data.

This results are available in Google Docs.

Madrid Embassy.

This is the chart for Madrid Embassy, which ranks seventh in the number of leaked cables.

20101224cablegate-darkmatter.003Between 2004-2009, the existence of at least 17,000 dispatches sent from Madrid can be deduced. In the same period, there are just 3500 leaked cables. The graph shows the breakdown by year. 2007 is leaked in a high percentage, the oppositat in 2004 and 2005. Also, the number of communications decreases progressively (Why? Maybe other networks are used instead of SIPRNet). The complete table is available in Google Docs.

Cablegate Dark Matter Howto

The Guardian published a text file with dates, source and tags of the 250,000 diplomatic cables included in the Cablegate. The content of this messages are being slowly released. (Using this short descriptions, I did an analysis of the messages related to Spain -tagged as SP-, and suggested the existence of communications related to the 2004 Madrid bombings and the Spaniard Internet Law. Later, El País published this cables, confirming the suspicions).

To infer the volume of communications the methodology is quite simple. Each cable has an identifier. For example, 04MADRID893 summaries the Madrid bombing on March 11th, 2004. This identifier can be broken into three parts:

  • 04: Current year (2004).
  • MADRID: Origin (the Embassy in Madrid)
  • 893: Record number?

What's that record number? Let's investigate. There are some cables sent on December 2004 from Madrid Embassy, as 04MADRID4887 (dated December 29, 2004). Its record number is "4887". Another message sent on February has ID 04MADRID527, record number "527". Looking to others cables dated on January, seems obvious that the record number starts at 1 and goes up, one by one, through the year. The record number is a simple ordinal value. Thanks to this simple rule, and reading the last cables of Madrid Embassy on December 2004, we know it sent ~4900 cables that year alone.

Ideally, the last cable of the year from each Embassy would be available, but the Cablegate data is not complete. Just fraction of the leaked messages has been published so far and those last cables of the year may not be leaked in Cablegate anyway. But, as can be seen in the graphics, this method allows to do an approximation.

The code used for the calculations is available at github (cablegate-sp) and has a BSD license.

Out of sight, out of mind.

One month after the first cable release, only two thousand messages has been published. At this rate it will take a decade to release all Cablegate content. Maybe not all messages are as relevant as those released so far, eg boring messages about visas. But if WikiLeaks has raised such a stir with just 2000 cables, I cannot imagine which other secrets remain in those thousands unfiltered (although top-secret cables use other networks).

Anyway, I'm sure there is still a lot of data mining job to do with the cables.

(Spanish version of this article: Cablegate: Lo que no está en WikiLeaks).

PS (December 30th, 2010): Ricardo Estalmán linked to this entry on Wikipedia about the German tank problem during World War II:

«Suppose one is an Allied intelligence analyst during World War II, and one has some serial numbers of captured German tanks. Further, assume that the tanks are numbered sequentially from 1 to N. How does one estimate the total number of tanks?»

The Cablegate case is quite similar. I will update the estimation with the formula cited in the above article, as soon as possible (Xmas days!).

Read more