Thursday, December 22nd, 2011
Merry Christmas from the Ubuntu One team! As announced at UDS in Orlando in November, the Ubuntu One team have been working on a project to allow application developers to sync data to Ubuntu One, and we’ve now reached the tech preview stage. Here’s the details.
U1DB is a database API for synchronised databases of JSON documents. It’s simple to use in applications, and allows apps to store documents and synchronise them between machines and devices. U1DB itself is not a database: instead, it’s an API and data model which can be backed by any database for storage. This means that you can use U1DB on different platforms, from different languages, and backed on to different databases, and sync between all of them.
Data sync is an essential part of what we want to offer with Ubuntu One. We already offer file sync, and that’s also part of our developer story (the APIs for file sync and music streaming are documented at https://one.ubuntu.com/developer/); U1DB is designed to offer data sync. Some information in your personal cloud is best done as files: your music, your photos, letters written in Word, things you want to back up. However, applications work with data: contacts, metadata about your files, todo lists, preferences and settings, and most stuff an application works with. We’re building U1DB to allow app developers to work with the same data on every platform and in every language; to save data and sync it between devices without having to manage that themselves.
We’ve been working on U1DB enough to have a working implementation, and now we want to get it out to all of you. We’re calling this a tech preview — it’s a working version of U1DB, with the intention that developers look at it and play with it and start working with it. We’re very interested in hearing your thoughts on the current implementation, the API, and its use in applications. Give us your thoughts in comments here or on the U1DB mailing list at https://launchpad.net/~u1db-discuss or just join us at #u1db on freenode for a chat. The tech preview is of the reference implementation — this is written in (and to be used from) Python on Windows or Ubuntu or anywhere Python runs, and it’s where we work on the algorithms and API used across all U1DB implementations. This tech preview contains the library to work with U1DBs from Python, and an example server and client implementation — U1DB is peer-to-peer syncing, so it’s perfectly possible to run your own server and sync to that, and this tech preview has an example server to play with.
The tech preview is mostly about getting input into the product so we can make sure we build something that is useful for people. We also have listed a number of open questions on detailed technical subjects which we’d like to hear opinions on from people who would be interested in using U1DB or writing a new implementation for another platform or language or database backend. Give us your thoughts on these too!
- In general, creating an API that is conceptually portable across many languages has some difficulties. For example, currently, the reference implementation provides a Document object, where doc_obj.content is a JSON string of the document content. This means that app developers using the Python API need to json.loads(doc_obj.content) to edit the content of a Document. Should a Document be addressable as a dictionary? This is an obvious thing to do in Python, but it does not necessarily make sense across many platforms; how would you envisage a Document object looking in C? In Java? In Objective C? In your choice of language?
- Revision IDs for a U1DB Document are currently quite verbose, but this makes them easy to read (and makes it easier to debug issues). Should we use a less readable but more compact format for these version vectors?
- Ubuntu One’s U1DB server will have a direct HTTP API, so that apps can retrieve and store data directly in the cloud without syncing. The HTTP API is also used for syncing U1DBs to Ubuntu One. What form of authorization should be used for this HTTP API, both for syncing and for direct access? Other Ubuntu One services use OAuth 1.1; should we examine OAuth 2, or other alternatives, or is it more important to be able to use the same tokens and auth libraries as other Ubuntu One services?
- Indexing is a tricky issue. Letting users provide code to do the indexing is tricky and creating a reasonably thorough DSL is a lot of work. We’re currently taking the DSL route; index expressions are basically a domain-specific language for querying a u1db. Is there a middle ground?
- Index expressions can not only name fields but also apply transformation functions to them. For example, lower(fieldname) stores the lowercased contents of a field as an index key, and splitwords(fieldname) splits the contents of the field on whitespace and stores each item as an index key. What are the basic transformation functions we should support? What are the use cases for your proposals? What do apps need?
- Each peer in replication has a replica uid, a name for that device. Should those ids be just uuids (as they are currently)? Can we use hostnames? Can we detect a db copied across machines? How about a db copied locally? Is identifying these important?
These questions are the stuff we are discussing currently. Any comments on these or other issues not covered here will be most welcome.
So, to get started, see the quickstart guide at http://people.canonical.com/~aquarius/u1db-docs/, and let us know about your ideas for applications using U1DB and your thoughts on the API!