Archive for September, 2009

Desktop Couch IRC talk

Thursday, September 3rd, 2009

Yesterday I did an introductory talk to Desktop Couch as part of Ubuntu Developer Week, talking about what it is, how to use the APIs to store data, and who else is using it.

The full logs of the session are available at https://wiki.ubuntu.com/MeetingLogs/devweek0909/CouchDB, and are reposted here (with some nicer formatting :))

Introduction

  • Hi, all. Welcome to Stuart’s House Of Desktop Couch Knowledge.
  • I’m Stuart Langridge, and I hack on the desktopcouch project!
  • Over the next hour I’m going to explain what desktopcouch is, how to use it, who else is using it, and some of the things you will find useful to know about the project.
  • I’ll talk for a section, and then stop for questions.
  • Please feel free to ask questions in #ubuntu-classroom-chat, and I’ll look at the end of each section to see which questions have been posted. Ask at any time; you don’t have to wait until the end of a section.
  • You should prefix your question in #ubuntu-classroom-chat with QUESTION: so that I notice it :-)

What is desktopcouch?

  • So, firstly, what’s desktopcouch?
  • Well, it’s giving every Ubuntu user a CouchDB on their desktop.
  • CouchDB is an Apache project to provide a document-oriented database. If you’re familiar with SQL databases, where you define a table and then a table has a number of rows and each row has the same columns in it…this is not like that.
  • Instead, in CouchDB you store “documents”, where each document is a set of key/value pairs. Think of this like a Python dictionary, or a JSON document.
  • So you can store one document like this:
  • { "name": "Stuart Langridge", "project": "Desktop Couch", "hair_colour": "red" }
  • and another document which is completely different:
  • { "name": "Stuart Langridge", "outgoings": [ { "shop": "In and Out Burger", "cost": "$12.99" } , { "shop": "Ferrari dealership", "cost": "$175000" } ] }
  • The interface to CouchDB is pure HTTP. Just like the web. It’s RESTful, for those of you who are familiar with web development.
  • This means that every programming language already knows how to speak it, at least in basic terms.
  • CouchDB also comes with an in-built in-browser editor, so you can look at and browse around and edit all the data stored in it.
  • So, the desktopcouch project is all about providing these databases for every user, so each user’s applications can store their data all in one place.
  • You can have as many databases in your desktop Couch as you or your applications want, and storage is unlimited.

Sharing data between machines

  • Desktop Couch is built to do “replication”, synchronizing your data between different machines. So if you have, say, Firefox storing your bookmarks in your desktop Couch on your laptop, those bookmarks could be automatically synchronized to your Mini 9 netbook, or to your desktop computer.
  • They can also be synchronized to Ubuntu One, or another running-in-the-cloud service, so you can see that data on the web, or synchronize between two machines that aren’t on the same network.
  • So you’ve got your bookmarks everywhere. Your own personal del.icio.us, but it’s your data, not locked up solely on anyone else’s servers.
  • Imagine if your apps stored their preferences in desktop Couch. Santa Claus brings you a new laptop, you plug it in, pair it with your existing machine, and all your apps are set up. No work.

Sharing data between applications

  • But sharing data between machines is only half the win. The other half is sharing data between applications.
  • I want all my stuff to collaborate. I don’t want to have to “import” data from one program to another, if I switch from Thunderbird to Evolution to KMail to mutt.
  • I want any application to know about my address book, to allow any application to easily add “send this to another person”, so that I can work with people I know.
  • I want to be able to store my songs in Banshee and rate them in Rhythmbox if I want — when people say that the Ubuntu desktop is about choice, that shouldn’t mean choosing between different incompatible data silos. I can choose one application and then choose another, you can choose a third, and we can all cooperate on the data.
  • My choice should be how I use my applications, and how they work; I shouldn’t have to choose between underlying data storage. With apps using desktopcouch I don’t have to.
  • All my data is stored in a unified place in a singular way — and I can look at my data any time I want, no matter which application put it there! Collaboration is what the open source desktop is good at, because we’re all working together. It should be easy to collaborate on data.
  • That’s a brief summary of what desktopcouch *is*: any questions so far before we get on to the meat: how do you actually Use This Thing?

Questions

  • mandel_macaque (hey, mandel :)) — that’s what the desktopcouch mailing list is for, so people can get together and talk about what should be in a standard record
  • there’s no ivory tower which hands down standard formats from the top of the mountain :)
  • mandel_macaque’s question was: will there be a “group” that will try to define standard records?
  • <mhall119|work> QUESTION: how does desktopcouch differ from/replace gconf?
  • mhall119|work, desktopcouch is for storing all sorts of user data. It’s not just about preferences, although you could store preferences in it
  • <sandy|lu1k> QUESTION: What about performance? Why would Banshee/rhythmbox switch to a slower way to store metadata?
  • sandy|lu1k, performance hasn’t really been an issue in our testing, and couchdb provides some serious advantages over existing things like sqlite or text files, like replication and user browseability
  • <mandel_macaque> QUESTIONS: Is desktopcouch creating the required infrastructure to allow user sync, or should applications take care of that?
  • desktopcouch is providing infrastructure and UI to “pair” machines and handle all the replication; applications do not have to know or worry about data being replicated to your other computers
  • <jopojop> QUESTION: can you store media like images, audio and video?
  • jopojop, not really — couchdb is designed for textual, key/value pair, dictionary data, not for binary data
  • it’s possible to store binary data in desktopcouch, but I’d suggest not importing your whole mp3 collection into it; store the metadata. The filesystem is good at handling binary data
  • <sandy|lu1k> QUESTION the real performance concern that media apps have is query speed for doing quick searches
  • sandy|lu1k, that’s something we’d really like to see more experimentation with. couchdb’s views architecture makes it really, really quick for some uses,

Using desktopcouch.records, the Python API

  • ok, let’s talk about how to use it :)
  • The easiest way to use desktopcouch is from Python, using the desktopcouch.records module.
  • This is installed by default in Karmic.
  • An individual “document” in desktop Couch is called a “record”, because there are certain extra things that are in a record over and above what stock CouchDB requires, and desktopcouch.records takes care of this for you.
  • First, a bit of example Python code! This is taken from the docs at /usr/share/doc/python-desktopcouch-records/api/records.txt.
  • >>> from desktopcouch.records.server import CouchDatabase
  • >>> from desktopcouch.records.record import Record
  • >>> my_database = CouchDatabase("testing", create=True)
  • # get the “testing” database. In your desktop Couch you can have many databases; each application can have its own with whatever name it wants. If it doesn’t exist already, this creates it.
  • >>> my_record = Record({ "name": "Stuart Langridge", "project": "Desktop Couch", "hair_colour": "red" }, record_type='http://example.com/testrecord')
  • # Create a record, currently not stored anywhere. Records must have a “record type”, a URL which is unique to this sort of record.
  • >>> my_record["weight"] = "too high!"
  • # A record works just like a Python dictionary, so you can add and remove keys from it.
  • >>> my_record_id = my_database.put_record(my_record)
  • # Actually save the record into the database. Records each have a unique ID; if you don’t specify one, the records API will choose one for you, and return it.
  • >>> fetched_record = my_database.get_record(my_record_id)
  • # You can retrieve records by ID
  • >>> print fetched_record["name"]
  • “Stuart Langridge”
  • # and the record you get back is a dictionary, just like when you’re creating it.
  • That’s some very basic code for working with desktop Couch; it’s dead easy to save records into the database.
  • You can work with it like any key/value pair database.
  • And then desktopcouch itself takes care of things like replicating your data to your netbook and your desktop without you having to do anything at all.
  • And the users of your application can see their data directly by using the web interface; no more grovelling around in dotfiles or sqlite3 databases from the command line to work out what an application has stored.
  • You can get at the web interface by browsing to file:///home/aquarius/.local/share/desktop-couch/couchdb.html in a web browser, which will take you to the right place.
  • (er, if your username is aquarius you can, anyway :))
  • I’ll stop there for some questions about this section!
  • ah, people in the chat channel are trying it out. YOu might need to install python-desktopcouch-records
  • the version in karmic right now has a couple of strange outstanding bugs which we’re working on which might make it a little difficult to follow along

Questions

  • <mandel_macaque> QUESTION: (about views) which is the policy for design documents (views), one per app?
  • mandel_macaque, no policy, thus far. Create whichever design docs you want to — having one per app sounds sensible, but an app might want more than one
  • mandel_macaque, this is an ideal topic to bring up for discussion on the mailing list :)
  • <test1> QUESTION: Does desktopCouch/CouchDB provide a means controls access to my data on a per application basis? I would not necessarily want any application to be able to access any data – I might want to silo two mail apps to different databases, etc.
  • test1, at the moment it does not (in much the same way as the filesystem doesn’t), but it would be possible to build that in
  • <mhall119|work> QUESTION: how does the HTML interact with couchdb? Javascript?
  • mhall119|work, (I assume you mean: how does the HTML web interface for browsing your data interact with couchdb?) yes, JavaScript
  • <AntoineLeclair> QUESTION: so when I do CRUD, it’s done locally, then replicated on the web DB? (and replicated locally from the web some other time to keep sync?)
  • AntoineLeclair, yes, broadly
  • <F30> QUESTION: So far, this sounds a bit like the registry which we all know and hate from the Windows world: Do you really think all applications should put there data into one monolithic databse, which in the end gets messed up?
  • F30, having data in one place allows you to do things like replicate that data and make generalisations about it. We have the advantage that desktopcouch is built on couchdb, which is not only dead robust but also open source, unlike the registry :)
  • <test1> In terms of replication – does CouchDb automate data merging (i.e. how does it handle conflict resolution) if I were to modify my bookmarks on multiple machines before replication took place?
  • test1, couch’s approach is “eventual consistency”. In the case of actual conflicts, desktopcouch stores both versions and marks them as conflicting; it’s up to the application that uses the data to resolve those conflicts in some way
  • perhaps by asking the user, or applying some algorthmic knowledge
  • the application knows way more about what the data is than couch itself does

Using views to query your desktop Couch

  • Next, on to views.
  • Being able to retrieve records one at a time is nice, but it’s not what you want to do most of the time.
  • To get records that match some criteria, use views.
  • Views are sort of like SQL queries and sort of not. Don’t try and think in terms of a relational database.
  • The best reference on views is the CouchDB book, available for free online (and still being worked on): the views chapter is at http://books.couchdb.org/relax/design-documents/views
  • Basically, a view is a JavaScript function.
  • When you request the records from a view, desktopcouch runs your view function against every document in the database and returns the results.
  • So, to return all documents with “name”: “Stuart Langridge”, the view function would look like this:
  • function(doc) { if (doc.name == "Stuart Langridge") emit(doc._id, doc) }
  • This sort of thinking takes a little getting used to, but you can do anything you want with it once you get into it
  • desktopcouch.records helps you create views and request them
  • # creating a view
  • >>> map_js = """function(doc) { emit(doc._id, null) }"""
  • >>> db.add_view("name of my view", map_js, None, "name of the view container")
  • # requesting the records that the view returns
  • >>> result = db.execute_view("name of my view", "name of the view container")
  • The “view container”, called a “design doc”, is a collection of views. So you can group your views together into different design docs.
  • (hence mandel_macaque’s question earlier about whether each app that uses the data in a database should have its own design doc(s). I suggest yes.)
  • Advanced people who know about map/reduce should know that this is a map/reduce approach.
  • You can also specify a reduce function (that’s the None parameter in the add_view function above)
  • The CouchDB book has all the information you’ll need on views and the complexities of them.
  • Questions on views? :-)

Questions

  • <mandel_macaque> QUESTION: taking as an example the contacts record, when we have to perform a diff we will have to take into account the application_annotations key, which is share among apps. How can my app know aht to do with other app data?
  • (bit of background for those not quite as au fait with desktopcouch: each desktopcouch record has a key called “application_annotations”, and under that there is a key for each application that wants to store data specific to that application about this record)
  • (so Firefox, for example, while storing a bookmark, would store url and title as top-level fields, and the Firefox internal ID of the bookmark as application_annotations.Firefox.internal_id or similar)
  • mandel_macaque, what you have to do with data in application_annotations is preserve it. You are on your honour to not delete another app’s metadata :)
  • <mhall119|work> QUESTION: might it be better to standardize on views, rather than records? So, Evolution and TBird might have their own database, with their own Contact record, but a single “All Contacts” view would aggregate both?
  • mhall119|work, the idea behind collaboration is that everyone co-operates on the actual data rather than views. So it’s better if each app stores the data in a standard format on which they collaborate, and then has its own views to get that data how *it* wants.
  • <FND> mandel_macaque: what if I wanted to wipe all Firefox data because I want a fresh start? right now, I can just delete ~/.mozilla/firefox/myProfile
  • I’m concerned that as a power user, I lose direct access
  • FND, you can delete the firefox database from the web interface, or from the command line. “curl -X delete http://localhost:5984/firefox”
  • or using desktopcouch.records, which is nicer — python -c “from desktopcouch.records.server import CouchDatabase; db = CouchDatabase(‘firefox’); db.delete()”
  • <mgunes> QUESTION: Wouldn’t deleting your profile simply reflect as deleted records on the CouchDB instance?
  • mgunes, how deletions affect applications that used the deleted data depends on the application. For example, there’s obviously a distinction between “I deleted this because I want to create a new one” and “I deleted this but I want to be able to get it back later”
  • the couchdb upstream team are currently working on having full history for all records, which will make this sort of work easier
  • <mhall119|work> QUESTION: if collaboration is to be done on the database level, there wouldn’t be a “Firefox” database, there would be a “Bookmarks” database, correct?
  • mhall119|work, yes, absolutely. My mistake in typing, sorry :)
  • <mhall119|work> QUESTION: for those that don’t want to mess with python of curl, will there be a CLI program for manipulating couchdb?
  • mhall119|work, there isn’t at the moment (curl or desktopcouch.records are pretty easy, we think) but I’m sure the bunch of talented people I’m talking to could whip up a program (or a set of bash aliases) in short order if there was desire for it
  • :-)
  • that would be a cool addition to desktopcouch
  • <mandel_macaque> QUESTION: Since couchdb stores all the version of my documents, will we have something like time machine in OS X? The data will already be there :D
  • mandel_macaque, certainly the infrastructure for that would be there once couchdb has full history and lots of apps are using desktopcouch
  • if someone writes it I’ll use it ;-0

Accessing desktop Couch from other languages

  • It’s not just Python, though. The Python Records API is in package python-desktopcouch-records, but there are also others.
  • couchdb-glib is a library to access desktopcouch from C.
  • Some example code (I don’t know much about C, but rodrigo_ wrote couchdb-glib and can answer all your questions :-))
  • couchdb = couchdb_new (hostname);
  • Create a database -> couchdb_create_database()
  • Delete a database -> couchdb_delete_database()
  • List documents in a database -> couchdb_list_documents()
  • More details are available for couchdb-glib at http://git.gnome.org./cgit/couchdb-glib/tree/README
  • We’re also working on a library to access desktopcouch from JavaScript, so you can use it from things like Firefox extensions of gjs.
  • er, *or* gjs :)
  • And because the access method for desktop Couch is HTTP, it’s easy to write an access library for any other language that you choose.
  • You can, of course, talk directly to desktop Couch using HTTP yourself, if you choose; you don’t have to use the Records API, or you might be implementing an access library for Ruby or Perl or Befunge or Smalltalk or Vala or something.

Underlying desktopcouch technical detail

  • desktopcouch.records (and couchdb-glib) do a certain amount of undercover work for you which you’ll need to do, and to explain that I need to delve into some deeper technical detail.
  • Your desktop Couch runs on a TCP port, listening to localhost only, which is randomly selected when it starts up. There is a D-Bus API to get that port.
  • So, to find out which port you need to connect to by HTTP, call the D-Bus API. (This API will also start your desktop Couch if it’s not already running.)
  • $ dbus-send –session –dest=org.desktopcouch.CouchDB –print-reply –type=method_call / org.desktopcouch.CouchDB.getPort
  • (desktopcouch.records does this for you.)
  • You must also be authenticated to read any data from your desktop Couch. Authentication is done with OAuth, so every HTTP request to desktopcouch must have a valid OAuth signature.
  • The OAuth details you need to sign requests are stored in the Gnome keyring.
  • (again, desktopcouch.records takes care of this for you so you don’t have to think about it.)
  • As I said above, every record must have a record_type, a URL which identifies what sort of record this is. So, if your recipe application stores all your favourite recipes in desktopcouch, you need to define a URL as the record type for “recipe records”.
  • That URL should point to a human-readable description of the fields in records of that type: so for a recipe document you might have name, ingredients, cooking instructions, oven heat.
  • The URL is there so other developers can find out what should be stored in a record, so more than one application can collaborate on storing data.
  • If I write a different recipe application, mine should work with records of the same format; that way I don’t lose all my recipes if I change applications, and me and the developers of the first app can collaborate.
  • Let’s take some more questions.

Questions

  • <mgunes> QUESTION: Is there any plan/need for Desktopcouch itself to talk to Midgard, for access to data stored by applications that use it? And did you investigate Midgard before going with CouchDB?
  • There’s been a lot of conversation between Midgard and CouchDB and desktopcouch and others
  • midgard implements the CouchDB replication API, so you can replicate your desktopcouch data to a midgard server
  • <FND> to clarify, another way to express my concerns – and I hate to be such a nagging naysayer here – is “transparency” – inspecting files is generally a whole lot more obvious than inspecting a DB (even if there’s a nifty web UI)
  • FND, applications are increasingly using databases rather than flat files anyway, because of the advantages you get from a database — as was asked about above, media players are using sqlite DBs and so on for quick searchability and indexability
  • <bas89> QUESTION: is couchDB an ubuntu-only project or will it be avaiable on fedora or my mobile phone?
  • couchdb runs, like, everywhere. It’s available on Ubuntu, Fedora, other Linux distros, Windows, OS X…
  • the couchdb upstream project love the idea of things like mobile phones running couch, and they’re working on that :)
  • desktopcouch, which sets up an individual couchdb for every user, is all written in Python and doesn’t do anything Ubuntu-specific, so it should be perfectly possible to run it on other Linux distros (and there’s a chap looking at getting it running on fedora)
  • and since it’s all Python it should be possible to have it on other platforms too, like Windows or the Mac.
  • <FND> QUESTION: by making applications rely on CouchDB, isn’t there a risk of diverging from other distros
  • desktopcouch isn’t Ubuntu-specific. There was lots of interest at the Gran Canaria Desktop Summit this year

Using Quickly

  • There is an Even Easier way to have applications use desktop Couch for data storage.
  • One of the really cool things in karmic is Quickly: https://wiki.ubuntu.com/Quickly
  • quickly helps you make applications…quickly. :-)
  • and apps created with Quickly use desktopcouch for data storage.
  • If you haven’t seen Quickly, it’s a way of easily handling all the boilerplate stuff you have to do to get a project going; “quickly create ubuntu-project myproject” gives you a “myproject” folder containing a Python project that works but doesn’t do anything.
  • So you can concentrate on writing the code to do what you want, rather than boilerplate to get started.
  • It’s dead neat :)
  • Anyway, quickly projects are set up to save application preferences into desktop Couch by default. So you get the advantages of using desktop Couch (replication, browsing of data) for every quickly project automatically.
  • The quickly guys have also contributed CouchGrid, a gtk.TreeView which is built on top of desktopcouch, so that it will display records from a desktopcouch database.
  • “quickly tutorial ubuntu-project” has lots of information about CouchGrid and how to use it.
  • Any questions about quickly? (I can’t guarantee to be able to answer them, but #quickly is great for this.)
  • I’m going to race throught he last section since I have 3 mins, and then try and answer the last few questions :)

Who’s using desktop Couch already?

  • So, who’s already using desktopcouch?
  • Quickly, as mentioned, uses desktopcouch for preferences in projects it creates.
  • The Gwibber team are working on using desktopcouch for data storage
  • Bindwood (http://launchpad.net/bindwood) is a Firefox extension to store bookmarks in desktopcouch
  • Macaco-contacts is transitioning to work with desktopcouch for contacts storage (http://www.themacaque.com/?p=248)
  • (perhaps :-))
  • Evolution can now, in the evolution-couchdb package, store all contacts in desktopcouch
  • Akonadi, the KDE project’s contacts and PIM server, can also store contacts in desktopcouch
  • These last three are interesting, because everyone’s collaborating on a standard record type and record format for “contacts”, so Evolution and Akonadi and Macaco-contacts will all share information.
  • So if you switch from Gnome to KDE, you won’t lose your address book.
  • I’m really keen that this happens, that applications that store similar data (think of mail clients and addressbooks, as above, or media players storing metadata and ratings, for example) should collaborate on standard formats.
  • Details about the desktopcouch project can be found at http://www.freedesktop.org/wiki/Specifications/desktopcouch
  • There’s a mailing list at http://groups.google.com/group/desktop-couchdb
  • The code is developed in Launchpad: http://launchpad.net/desktopcouch
  • The best place to ask questions generally is the #ubuntuone channel; all the desktopcouch developers are hanging out there
  • The best place to ask questions that you have right now is…right now, so go ahead and ask in #ubuntu-classroom-chat, and I’ll answer any other questions you have!
  • in the two minutes I have remaining ;-)

Questions

  • <bas69> QUESTION: whats about akonadi? is there competition?
  • akonadi has a desktopcouch back end for contacts, which was demonstrated at the Gran Canaria Desktop Summit — it’s dead neat to save a contact with Akonadi and then load it with Evolution :)
  • <alourie> aquarius: QUESTION: does that mean that ubuntuone also uses it?
  • desktopcouch lets you replicate your data between all your machines on your network — Ubuntu One has a cloud service so you can also send your data up into the cloud, so you can get at it from the web and replicate between machines anywhere on the internet
  • mgunes, yes indeed :)
  • ok I need to stop now, out of time. Next is kees, who I hope will forgive me for overrunning!