Canonical Voices

Posts tagged with 'mongodb'

niemeyer

mgo r2016.08.01

A major new release of the mgo MongoDB driver for Go is out, including the following enhancements and fixes:


Decimal128 support

Introduces the new bson.Decimal128 type, upcoming in MongoDB 3.4 and already available in the 3.3 development releases.

Extended JSON support

Introduces support for marshaling and unmarshaling MongoDB’s extended JSON specification, which extend the syntax of JSON to include support for other BSON types and also allow for extensions such as unquoted map keys and trailing commas.

The new functionality is available via the bson.MarshalJSON and bson.UnmarshalJSON functions.

New Iter.Done method

The new Iter.Done method allows querying whether an iterator is completely done or there’s some likelyhood of more items being returned on the next call to Iter.Next.

Feature implemented by Evan Broder.

Retry on Upsert key-dup errors

Curiously, as documented the server can actually report a key conflict error on upserts. The driver will now retry a number of times on these situations.

Fix submitted by Christian Muirhead.

Switched test suite to daemontools

Support for supervisord has been removed and replaced by daemontools, as the latter is easier to support across environments.

Travis CI support

All pull requests and master branch changes now being tested on several server releases.

Initial collation support in indexes

Support for collation is being widely introduced in 3.4 release, with experimental support already visible in 3.3.

This releases introduces the Index.Collation field which may be set to a mgo.Collation value.

Removed unnecessary unmarshal when running commands

Code which marshaled the command result for debugging purposes was being run out of debug mode. This has been fixed.

Reported by John Morales.

Fixed Secondary mode over mongos

Secondary mode wasn’t behaving properly when connecting to the cluster over a mongos. This has been fixed.

Reported by Gabriel Russell.

Fixed BuildInfo.VersionAtLeast

The VersionAtLeast comparison was broken when comparing certain strings. Logic was fixed and properly tested.

Reported by John Morales.

Fixed unmarshaling of ,inline structs on 1.6

Go 1.6 changed the behavior on unexported anonymous structs.

Livio Soares submitted a fix addressing that.

Fixed Apply on results containing errmsg

The Apply method was confusing a resulting object containing an errmsg field as an actual error.

Reported by Moshe Revah.

Improved documentation

Several contributions on documentation improvements and fixes.

Read more
niemeyer

One of my “official side projects” is the Go language driver for MongoDB, started a few years back while looking for a way to store data conveniently from the Go language while leaving aside some of the problems we have mapping code into table-based approaches.

Nowadays this is used in projects at Canonical, in the MongoDB tooling itself, and also in some of my own personal projects. For the personal server-side projects I’ve been using docker containers to conveniently deploy the database tooling, but this week when I went to update some of my older images pulled from the docker hub I found that the docker installed on that server was a bit too old, so it was time to update the servers.

But that got me wondering: what if I replaced all those containers by snaps? This would allow me to keep the convenience and safety of the isolation, while making a lot of things simpler. Unlike docker, snaps make the installed tooling more easily accessible to the host system (bins in the search path, processes as actual children from shell and systemd, etc), and can even use system resources directly assuming interfaces allow it (e.g. home files).

So I got into that, and perhaps ended up overdoing it a little bit. Based on my experience testing the driver, I really appreciate having all versions available for playing with, rather than just the latest one. This is how my local development system looks like right now:

$ snap list | grep 'Name\|mongo'
Name                  Version               Rev     Developer   Notes
mongo22               2.2.7                 1       niemeyer    -
mongo24               2.4.14                1       niemeyer    -
mongo26               2.6.12                1       niemeyer    -
mongo30               3.0.12                1       niemeyer    -
mongo32               3.2.7                 1       niemeyer    -
mongo33               3.3.9                 1       niemeyer    -

These are backed by upstream tarballs (snapcraft downloaded them pre-built from mongodb.com), and are all installed, running, and with tooling available for playing with. They are also published to the snap store which means you can easily make use of them locally as well. If you want to do that, here is a crash course on snaps and on how I packaged the database together specifically.

If you’re using Ubuntu, update to release 16.04 which has snaps working out of the box. For other distributions have a look at snapcraft.io to see what command must be run for ensuring it is available.

Then, pick your version and install it. For example, if you want to play with the features just announced at MongoDB World this week, you want the unstable version 3.3.9:

$ snap install mongo33
93.59 MB / 93.59 MB [============================] 100.00 % 1.33 MB/s 

Name     Version  Rev  Developer  Notes
mongo33  3.3.9    1    niemeyer   -

After that you already have the tooling available in your path (assuming you have /snap/bin there), and the daemon started. So go ahead and fire the client to talk to the database:

$ mongo33
MongoDB shell version: 3.3.9
connecting to: 127.0.0.1:3317/test
MongoDB server version: 3.3.9
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
        http://docs.mongodb.org/
Questions? Try the support group
        http://groups.google.com/group/mongodb-user
Server has startup warnings: 
(...) 
(...) ** NOTE: This is a development version (3.3.9) of MongoDB.
(...) **       Not recommended for production.
> 

Note that the server started on the non-standard port 33017 to allow multiple versions to be running together. The pattern is that the snap mongoNN will run on port NN017.

The tools installed also follow a similar pattern. Where upstream uses mongo, mongod, mongodump, etc, the snaps will have them under /snap/bin as mongo33, mongo33.d, mongo33.dump, and so on.

The systemd unit, if you want to interact with it for restarting, improving, debugging, etc, is named snap.mongo33.mongod.service, as usual for any snap that contains a daemon (snap.<snap name>.<app name>.service).

The data for the process that runs under systemd lives in the the standard writable area that the confinement system opens up for snaps in general – in this specific case /var/snap/mongo33/common/. The common directory is reused untouched on updates across snap revisions, which implies snapd won’t copy the data over when doing such updates. This compromises slightly the update safety, but is worth it for bulk data that we don’t want to copy on every snap refresh.

So this ended up quite nicely. Next up is the Go server code itself, which will be packed as a snap too, for deploying it into the servers in a similar way. The exact details for how these snaps are being built are publicly available too.

If you want a hand doing something similar, we have some helpful people at #snappy on FreeNode and also snapcraft@lists.snapcraft.io.

@gniemeyer

Read more
niemeyer

As much anticipated, this week Ubuntu 16.04 LTS was released with integrated support for snaps on classic Ubuntu.

Snappy 2.0 is a modern software platform, that includes the ability to define rich interfaces between snaps that control their security and confinement, comprehensive observation and control of system changes, completion and undoing of partial system changes across restarts/reboots/crashes, macaroon-based authentication for local access and store access, preliminary development mode, a polished filesystem layout and CLI experience, modern sequencing of revisions, and so forth.

The previous post in this series described the reassuring details behind how snappy does system changes. This post will now cover Snappy interfaces, the mechanism that controls the confinement and integration of snaps with other snaps and with the system itself.

A snap interface gives one snap the ability to use resources provided by another snap, including the operating system snap (ubuntu-core is itself a snap!). That’s quite vague, and intentionally so. Software interacts with other software for many reasons and in diverse ways, and Snappy is a platform that has to mediate all of that according to user needs.

In practice, though, the mechanism is straightforward and pleasant to deal with. Without any snaps in the system, there are no interfaces available:

% sudo snap interfaces
error: no interfaces found

If we install the ubuntu-core snap alone (done implicitly when the first snap is installed), we can already see some interface slots being provided by it, but no plugs connected to them:

% sudo snap install ubuntu-core
75.88 MB / 75.88 MB [=====================] 100.00 % 355.56 KB/s 

% snap interfaces
Slot                 Plug
:firewall-control    -
:home                -
:locale-control      -
(...)
:opengl              -
:timeserver-control  -
:timezone-control    -
:unity7              -
:x11                 -

The syntax is <snap>:<slot> and <snap>:<plug>. The lack of a snap name is a shorthand notation for slots and plugs on the operating system snap.

Now let’s install an application:

% sudo snap install ubuntu-calculator-app
120.01 MB / 120.01 MB [=====================] 100.00 % 328.88 KB/s 

% snap interfaces
Slot                 Plug
:firewall-control    -
:home                -
:locale-control      -
(...)
:opengl              ubuntu-calculator-app
:timeserver-control  -
:timezone-control    -
:unity7              ubuntu-calculator-app
:x11                 -

At this point the application should work fine. But let’s instead see what happens if we take away one of these interfaces:

% sudo snap disconnect \
             ubuntu-calculator-app:unity7 ubuntu-core:unity7 

% /snap/bin/ubuntu-calculator-app.calculator
QXcbConnection: Could not connect to display :0

The application installed depends on unity7 to be able to display itself properly, which is itself based on X11. When we disconnected the interface that gave it permission to be accessing these resources, the application was unable to touch them.

The security minded will observe that X11 is not in fact a secure protocol. A number of system abuses are possible when we hand an application this permission. Other interfaces such as home would give the snap access to every non-hidden file in the user’s $HOME directory (those that do not start with a dot), which means a malicious application might steal personal information and send it over the network (assuming it also defines a network plug).

Some might be surprised that this is the case, but this is a misunderstanding about the role of snaps and Snappy as a software platform. When you install software from the Ubuntu archive, that’s a statement of trust in the Ubuntu and Debian developers. When you install Google’s Chrome or MongoDB binaries from their respective archives, that’s a statement of trust in those developers (these have root on your system!). Snappy is not eliminating the need for that trust, as once you give a piece of software access to your personal files, web camera, microphone, etc, you need to believe that it won’t be using those allowances maliciously.

The point of Snappy’s confinement in that picture is to enable a software ecosystem that can control exactly what is allowed and to whom in a clear and observable way, in addition to the same procedural care that we’ve all learned to appreciate in the Linux world, not instead of it. Preventing people from using all relevant resources in the system would simply force them to use that same software over less secure mechanisms instead of fixing the problem.

And what we have today is just the beginning. These interfaces will soon become much richer and more fine grained, including resource selection (e.g. which serial port?), and some of them will disappear completely in favor of more secure choices (Unity 8, for instance).

These are exciting times for Ubuntu and the software world.

@gniemeyer

Read more
niemeyer

mgo r2016.02.04

This is one of the most packed releases of the mgo driver for Go in recent times. There are new features, important fixes, and relevant
internal reestructuring to support the on-going server improvements.

As usual for the driver, compatibility is being preserved both with old applications and with old servers, so updating should be a smooth experience.

Release r2016.02.04 of mgo includes the following changes which were requested, proposed, and performed by a great community.

Enjoy!

Exposed access to individual bulk error cases

Accessing the individual errors obtained while attempting a set of bulk operations is now possible via the new mgo.BulkError error type and its Cases method which returns a slice of mgo.BulkErrorCases which are properly indexed according to the operation order used. There are documented server limitations for MongoDB version 2.4 and older.

This change completes the bulk API. It can now perform optimized bulk queries when communicating with recent servers (MongoDB 2.6+), perform the same operations in older servers using compatible but less performant options, and in both cases provide more details about obtained errors.

Feature first requested by pjebs.

New fields in CollectionInfo

The CollectionInfo type has new fields for dealing with the recently introduced document validation MongoDB feature, and also the storage engine-specific options.

Features requested by nexcode and pjebs.

New Find and GetMore command support

MongoDB is moving towards replacing the old wire protocol with a command-based based implementation, and every recent release introduced changes around that. This release of mgo introduces support for the find and getMore commands which were added to MongoDB 3.2. These are exercised whenever querying the database or iterating over queried results.

Previous server releases will continue to use the classical mechanism, and the two approaches should be compatible. Please report any issues in that regard.

Do not fallback to Monotonic mode improperly

Recent driver changes adapted the Pipe.Iter, Collection.Indexes, and Database.CollectionNames methods to work with recent server releases. These changes also introduced a bug that could cause the driver to talk to a secondary server improperly, when that operation was the first operation performed on the session. This has been fixed.

Problem reported by Sundar.

Fix crash in new bulk update API

The new methods introduced in the bulk update API in the last release were crashing when a connection error occurred.

Fix contributed by Maciej Galkowski.

Enable TCP keep-alives for all connections

As requested by developers, TCP keep-alives are now enabled for all connections. No timing is specified, so the default operating system setting will be used.

Feature requested by Hunor Kovács, Berni Varga, and Martin Garton.

ChangeInfo.Updated now behaves as documented

The ChangeInfo.Updated field is documented to report the number of documents that were changed, but in fact that was not possible in old releases of the driver, since the server did not provide that information. Instead, the server only reported the number of documents matched by the selection document.

This has been fixed, so starting with MongoDB 2.6, the driver will behave as documented, and inform the number of documents that were indeed updated. This is related to the next driver change:

New ChangeInfo.Matched field

The new ChangeInfo.Matched field will report the number of documents that matched the selection document, whether the performed change was a removal, an update, or an upsert.

Feature requested by Žygimantas and other list members.

ObjectId now supports TextMarshaler/TextUnmarshaler

ObjectId now knows how to marshal/unmarshal itself as text in hex format when using its encoding.TextMarshaler and TextUnmarshaler interfaces.

Contributed by Jack Spirou.

Created GridFS index is now unique

The index on {“files_id”, “n”} automatically created for GridFS chunks when a file write completes now enforces the uniqueness of the key.

Contributed by Wisdom Omuya.

Use SIGINT in dbtest.DBServer

The dbtest.DBServer was stopping the server with SIGKILL, which would not give it enough time for a clean shutdown. It will now stop it with SIGINT.

Contributed by Haijun Wang.

Ancient field tag logic dropped

The very old field tag format parser, in use several years back even before Go 1 was released, was still around in the code base for no benefit.

This has been removed by Alexandre Cesaro.

Documentation improvements

Documentation improvements were contributed by David Glasser, Ryan Chipman, and Shawn Smith.

Fixed BSON skipping of incorrect slice types

The BSON parser was collapsing when an array value was unmarshaled into an existing field that was not of an appropriate type for such values. This has been fixed so that the the bogus field is ignored and the value skipped.

Fix contributed by Gabriel Russel.

Read more
niemeyer

mgo r2015.05.29

Another release of mgo hits the shelves, just in time for the upcoming MongoDB World event.

A number of of relevant improvements have landed since the last stable release:


New package for having a MongoDB server in test suites

The new gopkg.in/mgo.v2/dbtest package makes it comfortable to plug a real MongoDB server into test suites. Its simple interface consists of a handful of methods, which together allow obtaining a new mgo session to the server, wiping all existent data, or stopping it altogether once the suite is done. This design encourages an efficient use of resources, by only starting the server if necessary, and then quickly cleaning data across runs instead of restarting the server.

See the documentation for more details.

(UPDATE: The type was originally testserver.TestServer and was renamed dbtest.DBServer to improve readability within test suites. The old package and name remain working for the time being, to avoid breakage)

Full support for write commands

This release includes full support for the write commands first introduced in MongoDB 2.6. This was done in a compatible way, both in the sense that the driver will continue to use the wire protocol to perform writes on older servers, and also in the sense that the public API has not changed.

Tests for the new code path have been successfully run against MongoDB 2.6 and 3.0. Even then, as an additional measure to prevent breakage of existent applications, in this release the new code path will be enabled only when interacting with MongoDB 3.0+. The next stable release should then enable it for earlier releases as well, after some additional real world usage took place.

New ParseURL function

As perhaps one of the most requested features of all times, there’s now a public ParseURL function which allows code to parse a URL in any of the formats accepted by Dial into a DialInfo value which may be provided back into DialWithInfo.

New BucketSize field in mgo.Index

The new BucketSize field in mgo.Index supports the use of indexes of type geoHaystack.

Contributed by Deiwin Sarjas.

Handle Setter and Getter interfaces in slice types

Slice types that implement the Getter and/or Setter interfaces will now be custom encoded/decoded as usual for other types.

Problem reported by Thomas Bouldin.

New Query.SetMaxTime method

The new Query.SetMaxTime method enables the use of the special $maxTimeMS query parameter, which constrains the query to stop after running for the specified time. See the method documentation for details.

Feature implemented by Min-Young Wu.

New Query.Comment method

The new Query.Comment method may be used to annotate queries for further analysis within the profiling data.

Feature requested by Mike O’Brien.

sasl sub-package moved into internal

The sasl sub-package is part of the implementation of SASL support in mgo, and is not meant to be accessed directly. For that reason, that package was moved to internal/sasl, which according to recent Go conventions is meant to explicitly flag that this is part of mgo’s implementation rather than its public API.

Improvements in txn’s PurgeMissing

The PurgeMissing logic was improved to work better in older server versions which retained all aggregation pipeline results in memory.

Improvements made by Menno Smits.

Fix connection statistics bug

Change prevents the number of slave connections from going negative on a particular case.

Fix by Oleg Bulatov.

EnsureIndex support for createIndexes command

The EnsureIndex method will now use the createIndexes command where available.

Feature requested by Louisa Berger.

Support encoding byte arrays

Support encoding byte arrays in an equivalent way to byte slices.

Contributed by Tej Chajed.

Read more
niemeyer

mgo r2014.10.12

A new release of the mgo MongoDB driver for Go is out, packed with contributions and features. But before jumping into the change list, there’s a note in the release of MongoDB 2.7.7 a few days ago that is worth celebrating:

New Tools!
– The MongoDB tools have been completely re-written in Go
– Moved to a new repository: https://github.com/mongodb/mongo-tools
– Have their own JIRA project: https://jira.mongodb.org/browse/TOOLS

So far this is part of an unstable release of the MongoDB server, but it implies that if the experiment works out every MongoDB server release will be carrying client tools developed in Go and leveraging the mgo driver. This extends the collaboration with MongoDB Inc. (mgo is already in use in the MMS product), and some of the features in release r2014.10.12 were made to support that work.

The specific changes available in this release are presented below. These changes do not introduce compatibility issues, and most of them are new features.

Fix in txn package

The bug would be visible as an invariant being broken, and the transaction application logic would panic until the txn metadata was cleaned up. The bug does not cause any data loss nor incorrect transactions to be silently applied. More stress tests were added to prevent that kind of issue in the future.

Debug information contributed by the juju team at Canonical.

MONGODB-X509 auth support

The MONGODB-X509 authentication mechanism, which allows authentication via SSL client certificates, is now supported.

Feature contributed by Gabriel Russel.

SCRAM-SHA-1 auth support

The MongoDB server is changing the default authentication protocol to SCRAM-SHA-1. This release of mgo defaults to authenticating over SCRAM-SHA-1 if the server supports it (2.7.7 and later).

Feature requested by Cailin Nelson.

GSSAPI auth on Windows too

The driver can now authenticate with the GSSAPI (Kerberos) mechanism on Windows using the standard operating system support (SSPI). The GSSAPI support on Linux remains via the cyrus-sasl library.

Feature contributed by Valeri Karpov.

Struct document ids on txn package

The txn package can now handle documents that use struct value keys.

Feature contributed by Jesse Meek.

Improved text index support

The EnsureIndex family of functions may now conveniently define text indexes via the usual shorthand syntax ("$text:field"), and Sort can use equivalent syntax ("$textScore:field") to inject the text indexing score.

Feature contributed by Las Zenow.

Support for BSON’s deprecated DBPointer

Although the BSON specification defines DBPointer as deprecated, some ancient applications still depend on it. To enable the migration of these applications to Go, the type is now supported.

Feature contributed by Mike O’Brien.

Generic Getter/Setter document types

The Getter/Setter interfaces are now respected when unmarshaling documents on any type. Previously they would only be respected on maps and structs.

Feature requested by Thomas Bouldin.

Improvements on aggregation pipelines

The Pipe.Iter method will now return aggregation results using cursors when possible (MongoDB 2.6+), and there are also new methods to tweak the aggregation behavior: Pipe.AllowDiskUse, Pipe.Batch, and Pipe.Explain.

Features requested by Roman Konz.

Decoding into custom bson.D types

Unmarshaling will now work for types that are slices of bson.DocElem in an equivalent way to bson.D.

Feature requested by Daniel Gottlieb.

Indexes and CommandNames via commands

The Indexes and CollectionNames methods will both attempt to use the new command-based protocol, and fallback to the old method if that doesn’t work.

GridFS default chunk size

The default GridFS chunk size changed from 256k to 255k, to ensure that the total document size won’t go over 256k with the additional metadata. Going over 256k would force the reservation of a 512k block when using the power-of-two allocation schema.

Performance of bson.Raw decoding

Unmarshaling data into a bson.Raw will now bypass the decoding process and record the provided data directly into the bson.Raw value. This significantly improves the performance of dumping raw data during iteration.

Benchmarks contributed by Kyle Erf.

Performance of seeking to end of GridFile

Seeking to the end of a GridFile will now not read any data. This enables a client to find the size of the file using only the io.ReadSeeker interface with low overhead.

Improvement contributed by Roger Peppe.

Added Query.SetMaxScan method

The SetMaxScan method constrains the server to only scan the specified number of documents when fulfilling the query.

Improvement contributed by Abhishek Kona.

Added GridFile.SetUploadDate method

The SetUploadDate method allows changing the upload date at file writing time.

Read more
niemeyer

mgo r2013.09.04 released

Another release of the mgo Go driver for MongoDB hits the repository.

Here is the selection of fixes and improvements:

Improved timeouts

The timeout support has been improved to include socket-level deadlines. This better manages cases when the TCP connection with the server is silently interrupted.

The default socket deadline matches the previously documented timeout of 10 seconds for the connection establishment and one minute thereafter.

If the DialWithTimeout function is used, or the DialInfo.Timeout field is provided to DialWithInfo, the provided timeout will be used as both the initial sync timeout and the initial socket timeout.

Customizing the socket timeout independently is also possible via the new Session.SetSocketTimeout method.

omitempty flag for struct values

Marshalled structs (explicitly or via mgo) can now also use the omitempty flag in struct fields. A struct field is considered “empty” if the current value matches the zero value for the respective struct.

Thanks to Alex S. for suggesting the feature.

mgo/txn handling of multiple operations for the same document

The mgo/txn multi-document transaction support package will now correctly handle multiple operations in the same document within a single transaction.

Before the fix, only the first operation would work due to a miscalculation of the internal sequence of transaction ids for the following documents. This means that the change should not affect existent code, unless the code was already not functioning properly.

Having this working means that doing an Insert+Update on a single document will insert the document if it doesn’t exist, and then always update the document (previously existent or newly inserted).

On the other hand, doing it in the inverted order, as Update+Insert, is a nice to way to say “either update or insert”, as only a single one of them can possibly work depending on whether the document previously exited or not.

Thanks to Jesse van den Kieboom for reporting the problem.

New bson.RawD document type

It’s easier to explain the new document type in comparison to the previously existent bson.D type, which allows marshalling a bson document with a slice of names and values, such as:

bson.D{{"field1", 42}, {"field2", "forty two"}}

Besides being a representation that is flexible and relatively compact (if compared to a map), it also allows the field ordering to be respected when being delivered to the server, which is required by some features of MongoDB.

The new bson.RawD type is similar, but rather than using interface{} element values as observed with bson.D, all values have a bson.Raw type, containing the the raw bson []byte data and kind for the respective field value. This offers a convenient and efficient way to lazily process whole documents or subdocuments (by using the type in a field).

This feature was inspired by a need reported by Daniel Gottlieb.

Read more
niemeyer

In an effort to polish the recently released draft of the strepr v1 specification, I’ve spent the last couple of days in a Go reference implementation.

The implemented algorithm is relatively simple, efficient, and consumes a conservative amount of memory. The aspect of it that deserved the most attention is the efficient encoding of a float number when it carries an integer value, as covered before. The provided tests are a useful reference as well.

The API offered by the implemented package is minimal, and matches existing conventions. For example, this simple snippet will generate a hash for the stable representation of the provided value:

value := map[string]interface{}{"a": 1, "b": []int{2, 3}}
hash := sha1.New()
strepr.NewEncoder(hash).Encode(value)
fmt.Printf("%x\n", hash.Sum(nil))
// Outputs: 29a77d09441528e02a27dc498d0a757da06250a0

Along with the reference implementation comes a simple command line tool to play with the concept. It allows easily arriving at the same result obtained above by processing a JSON value instead:

$ echo '{"a": 1.0, "b": [2, 3]}' | ./strepr -in-json -out-sha1
29a77d09441528e02a27dc498d0a757da06250a0

Or YAML:

$ cat | ./strepr -in-yaml -out-sha1                 
a: 1
b:
   - 2
   - 3
29a77d09441528e02a27dc498d0a757da06250a0

Or even BSON, the binary format used by MongoDB:

$ bsondump dump.bson
{ "a" : 1, "b" : [ 2, 3 ] }
1 objects found
$ cat dump.bson | ./strepr -in-bson -out-sha1
29a77d09441528e02a27dc498d0a757da06250a0

In all of those cases the hash obtained is the same, despite the fact that the processed values were typed differently in some occasions. For example, due to its Javascript background, some JSON libraries may unmarshal numbers as binary floating point values, while others distinguish the value based on the formatting used. The strepr algorithm flattens out that distinction so that different platforms can easily agree on a common result.

To visualize (or debug) the stable representation defined by strepr, the reference implementation has a debug dump facility which is also exposed in the command line tool:

$ echo '{"a": 1.0, "b": [2, 3]}' | ./strepr -in-json -out-debug
map with 2 pairs (0x6d02):
   string of 1 byte (0x7301) "a" (0x61)
    => uint 1 (0x7001)
   string of 1 byte (0x7301) "b" (0x62)
    => list with 2 items (0x6c02):
          - uint 2 (0x7002)
          - uint 3 (0x7003)

Assuming a Go compiler and the go tool are available, the command line strepr tool may be installed with:

$ go get launchpad.net/strepr/cmd/strepr

As a result of the reference implementation work, a few clarifications and improvements were made to the specification:

  • Enforce the use of UTF-8 for Unicode strings and explain why normalization is being left out.
  • Enforce a single NaN representation for floats.
  • Explain that map key uniqueness refers to the representation.
  • Don’t claim the specification is easy to implement; floats require attention.
  • Mention reference implementation.

Read more
niemeyer

Note: This is a candidate version of the specification. This note will be removed once v1 is closed, and any changes will be described at the end. Please get in touch if you’re implementing it.

Contents


Introduction

This specification defines strepr, a stable representation that enables computing hashes and cryptographic signatures out of a defined set of composite values that is commonly found across a number of languages and applications.

Although the defined representation is a serialization format, it isn’t meant to be used as a traditional one. It may not be seen entirely in memory at once, or written to disk, or sent across the network. Its role is specifically in aiding the generation of hashes and signatures for values that are serialized via other means (JSON, BSON, YAML, HTTP headers or query parameters, configuration files, etc).

The format is designed with the following principles in mind:

Understandable — The representation must be easy to understand to increase the chances of it being implemented correctly.

Portable — The defined logic works properly when the data is being transferred across different platforms and implementations, independently from the choice of protocol and serialization implementation.

Unambiguous — As a natural requirement for producing stable hashes, there is a single way to process any supported value being held in the native form of the host language.

Meaning-oriented — The stable representation holds the meaning of the data being transferred, not its type. For example, the number 7 must be represented in the same way whether it’s being held in a float64 or in an uint16.


Supported values

The following values are supported:

  • nil: the nil/null/none singleton
  • bool: the true and false singletons
  • string: raw sequence of bytes
  • integers: positive, zero, and negative integer numbers
  • floats: IEEE754 binary floating point numbers
  • list: sequence of values
  • map: associative value→value pairs


Representation

nil = 'z'

The nil/null/none singleton is represented by the single byte 'z' (0x7a).

bool = 't' / 'f'

The true and false singletons are represented by the bytes 't' (0x74) and 'f' (0x66), respectively.

unsigned integer = 'p' <value>

Positive and zero integers are represented by the byte 'p' (0x70) followed by the variable-length encoding of the number.

For example, the number 131 is always represented as {0x70, 0x81, 0x03}, independently from the type that holds it in the host language.

negative integer = 'n' <absolute value>

Negative integers are represented by the byte 'n' (0x6e) followed by the variable-length encoding of the absolute value of the number.

For example, the number -131 is always represented as {0x6e, 0x81, 0x03}, independently from the type that holds it in the host language.

string = 's' <num bytes> <bytes>

Strings are represented by the byte 's' (0x73) followed by the variable-length encoding of the number of bytes in the string, followed by the specified number of raw bytes. If the string holds a list of Unicode code points, the raw bytes must contain their UTF-8 encoding.

For example, the string hi is represented as {0x73, 0x02, 'h', 'i'}

Due to the complexity involved in Unicode normalization, it is not required for the implementation of this specification. Consequently, Unicode strings that if normalized would be equal may have different stable representations.

binary float = 'd' <binary64>

32-bit or 64-bit IEEE754 binary floating point numbers that are not holding integers are represented by the byte 'd' (0x64) followed by the big-endian 64-bit IEEE754 binary floating point encoding of the number.

There are two exceptions to that rule:

1. If the floating point value is holding a NaN, it must necessarily be encoded by the following sequence of bytes: {0x64, 0x7f, 0xf8, 0x00 0x00, 0x00, 0x00, 0x00, 0x00}. This ensures all NaN values have a single representation.

2. If the floating point value is holding an integer number it must instead be encoded as an unsigned or negative integer, as appropriate. Floating point values that hold integer numbers are defined as those where floor(v) == v && abs(v) != ∞.

For example, the value 1.1 is represented as {0x64, 0x3f, 0xf1, 0x99, 0x99, 0x99, 0x99, 0x99, 0x9a}, but the value 1.0 is represented as {0x70, 0x01}, and -0.0 is represented as {0x70, 0x00}.

This distinction means all supported numbers have a single representation, independently from the data type used by the host language and serialization format.

list = 'l' <num items> [<item> ...]

Lists of values are represented by the byte 'l' (0x6c), followed by the variable-length encoding of the number of pairs in the list, followed by the stable representation of each item in the list in the original order.

For example, the value [131, -131] is represented as {0x6c, 0x70, 0x81, 0x03, 0x6e, 0x81, 0x03, 0x65}

map = 'm' <num pairs> [<item key> <item value>  ...]

Associative maps of values are represented by the byte 'm' (0x6d) followed by the variable-length encoding of the number of pairs in the map, followed by an ordered sequence of the stable representation of each key and value in the map. The pairs must be sorted so that the stable representation of the keys is in ascending lexicographical order. A map must not have multiple keys with the same representation.

For example, the map {"a": 4, 5: "b"} is always represented as {0x6d, 0x02, 0x70, 0x05, 0x73, 0x01, 'b', 0x73, 0x01, 'a', 0x70, 0x04}.


Variable-length encoding

Integers are variable-length encoded so that they can be represented in short space and with unbounded size. In an encoded number, the last byte holds the 7 least significant bits of the unsigned value, and zero as the eight bit. If there are remaining non-zero bits, the previous byte holds the next 7 bits, and the eight bit is set on to flag the continuation to the next byte. The process continues until there are non-zero bits remaining. The most significant bits end up in the first byte of the encoded value, which must necessarily not be 0x80.

For example, the number 128 is variable-length encoded as {0x81, 0x00}.


Reference implementation

A reference implementation is available, including a test suite which should be considered when implementing the specification.


Changes

draft1 → draft2

  • Enforce the use of UTF-8 for Unicode strings and explain why normalization is being left out.
  • Enforce a single NaN representation for floats.
  • Explain that map key uniqueness refers to the representation.
  • Don’t claim the specification is easy to implement; floats require attention.
  • Mention reference implementation.

Read more
niemeyer

The very first time the concepts behind the juju project were presented, by then still under the prototype name of Ubuntu Pipes, was about four years ago, in July of 2009. It was a short meeting with Mark Shuttleworth, Simon Wardley, and myself, when Canonical still had an office on a tall building by the Thames. That was just the seed of a long road of meetings and presentations that eventually led to the codification of these ideas into what today is a major component of the Ubuntu strategy on servers.

Despite having covered the core concepts many times in those meetings and presentations, it recently occurred to me that they were never properly written down in any reasonable form. This is an omission that I’ll attempt to fix with this post while still holding the proper context in mind and while things haven’t changed too much.

It’s worth noting that I’ve stepped aside as the project technical lead in January, which makes more likely for some of these ideas to take a turn, but they are still of historical value, and true for the time being.

Contents

This post is long enough to deserve an index, but these sections do build up concepts incrementally, so for a full understanding sequential reading is best:


Classical deployments

In a simplistic sense, deploying an application means configuring and running a set of processes in one or more machines to compose an integrated system. This procedure includes not only configuring the processes for particular needs, but also appropriately interconnecting the processes that compose the system.

The following figure depicts a simple example of such a scenario, with two frontend machines that had the Wordpress software configured on them to serve the same content out of a single backend machine running the MySQL database.

Deploying even that simple environment already requires the administrator to deal with a variety of tasks, such as setting up physical or virtual machines, provisioning the operating system, installing the applications and the necessary dependencies, configuring web servers, configuring the database, configuring the communication across the processes including addresses and credentials, firewall rules, and so on. Then, once the system is up, the deployed system must be managed throughout its whole lifecycle, with upgrades, configuration changes, new services integrated, and more.

The lack of a good mechanism to turn all of these tasks into high-level operations that are convenient, repeatable, and extensible, is what motivated the development of juju. The next sections provide an overview of how these problems are solved.


Preparing a blank slate

Before diving into the way in which juju environments are organized, a few words must be said about what a juju environment is in the first place.

All resources managed by juju are said to be within a juju environment, and such an environment may be prepared by juju itself as long as the administrator has access to one of the supported infrastructure providers (AWS, OpenStack, MAAS, etc).

In practice, creating an environment is done by running juju’s bootstrap command:

$ juju bootstrap

This will start a machine in the configured infrastructure provider and prepare the machine for running the juju state server to control the whole environment. Once the machine and the state server are up, they’ll wait for future instructions that are provided via follow up commands or alternative user interfaces.


Service topologies

The high-level perspective that juju takes about an environment and its lifecycle is similar to the perspective that a person has about them. For instance, although the classical deployment example provided above is simple, the mental model that describes it is even simpler, and consists of just a couple of communicating services:

That’s pretty much the model that an administrator using juju has to input into the system for that deployment to be realized. This may be achieved with the following commands:

$ juju deploy cs:precise/wordpress
$ juju deploy cs:precise/mysql
$ juju add-relation wordpress mysql

These commands will communicate with the previously bootstrapped environment, and will input into the system the desired model. The commands themselves don’t actually change the current state of the deployed software, but rather inform the juju infrastructure of the state that the environment should be in. After the commands take place, the juju state server will act to transform the current state of the deployment into the desired one.

In the example described, for instance, juju starts by deploying two new machines that are able to run the service units responsible for Wordpress and MySQL, and configures the machines to run agents that manipulate the system as needed to realize the requested model. An intermediate stage of that process might conceptually be represented as:

topology-step-1

The service units are then provided with the information necessary to configure and start the real software that is responsible for the requested workload (Wordpress and MySQL themselves, in this example), and are also provided with a mechanism that enables service units that were related together to easily exchange data such as addresses, credentials, and so on.

At this point, the service units are able to realize the requested model:

topology-step-2

This is close to the original scenario described, except that there’s a single frontend machine running Wordpress. The next section details how to add that second frontend machine.


Scaling services horizontally

The next step to match the original scenario described is to add a second service unit that can run Wordpress, and that can be achieved by the single command:

$ juju add-unit wordpress

No further commands or information are necessary, because the juju state server understands what the model of the deployment is. That model includes both the configuration of the involved services and the fact that units of the wordpress service should talk to units of the mysql service.

This final step makes the deployed system look equivalent to the original scenario depicted:

topology-step-3

Although that is equivalent to the classic deployment first described, as hinted by these examples an environment managed by juju isn’t static. Services may be added, removed, reconfigured, upgraded, expanded, contracted, and related together, and these actions may take place at any time during the lifetime of an environment.

The way that the service reacts to such changes isn’t enforced by the juju infrastructure. Instead, juju delegates service-specific decisions to the charm that implements the service behavior, as described in the following section.


Charms

A juju-managed environment wouldn't be nearly as interesting if all it could do was constrained by preconceived ideas that the juju developers had about what services should be supported and how they should interact among themselves and with the world.

Instead, the activities within a service deployed by juju are all orchestrated by a juju charm, which is generally named after the main software it exposes. A charm is defined by its metadata, one or more executable hooks that are called after certain events take place, and optionally some custom content.

The charm metadata contains basic declarative information, such as the name and description of the charm, relationships the charm may participate in, and configuration options that the charm is able to handle.

The charm hooks are executable files with well-defined names that may be written in any language. These hooks are run non-concurrently to inform the charm that something happened, and they give a chance for the charm to react to such events in arbitrary ways. There are hooks to inform that the service is supposed to be first installed, or started, or configured, or for when a relation was joined, departed, and so on.

This means that in the previous example the service units depicted are in fact reporting relevant events to the hooks that live within the wordpress charm, and those hooks are the ones responsible for bringing the Wordpress software and any other dependencies up.

wordpress-service-unit

The interface offered by juju to the charm implementation is the same, independently from which infrastructure provider is being used. As long as the charm author takes some care, one can create entire service stacks that can be moved around among different infrastructure providers.


Relations

In the examples above, the concept of service relationships was introduced naturally, because it’s indeed a common and critical aspect of any system that depends on more than a single process. Interestingly, despite it being such a foundational idea, most management systems in fact pay little attention to how the interconnections are modeled.

With juju, it’s fair to say that service relations were part of the system since inception, and have driven the whole mindset around it.

Relations in juju have three main properties: an interface, a kind, and a name.

The relation interface is simply a unique name that represents the protocol that is conventionally followed by the service units to exchange information via their respective hooks. As long as the name is the same, the charms are assumed to have been written in a compatible way, and thus the relation is allowed to be established via the user interface. Relations with different interfaces cannot be established.

The relation kind informs whether a service unit that deploys the given charm will act as a provider, a requirer, or a peer in the relation. Providers and requirers are complementary, in the sense that a service that provides an interface can only have that specific relation established with a service that requires the same interface, and vice-versa. Peer relations are automatically established internally across the units of the service that declares the relation, and enable easily clustering together these units to setup masters and slaves, rings, or any other structural organization that the underlying software supports.

The relation name uniquely identifies the given relation within the charm, and allows a single charm (and service and service units that use it) to have multiple relations with the same interface but different purposes. That identifier is then used in hook names relative to the given relation, user interfaces, and so on.

For example, the two communicating services described in examples might hold relations defined as:

wordpress-mysql-relation-details

When that service model is realized, juju will eventually inform all service units of the wordpress service that a relation was established with the respective service units of the mysql service. That event is communicated via hooks being called on both units, in a way resembling the following representation:

wordpress-mysql-relation-workflow

As depicted above, such an exchange might take the following form:

  1. The administrator establishes a relation between the wordpress service and the mysql service, which causes the service units of these services (wordpress/1 and mysql/0 in the example) to relate.
  2. Both service units concurrently call the relation-joined hook for the respective relation. Note that the hook is named after the local relation name for each unit. Given the conventions established for the mysql interface, the requirer side of the relation does nothing, and the provider informs the credentials and database name that should be used.
  3. The requirer side of the relation is informed that relation settings have changed via the relation-changed hook. This hook implementation may pick up the provided settings and configure the software to talk to the remote side.
  4. The Wordpress software itself is run, and establishes the required TCP connection to the configured database.

In that workflow, neither side knows for sure what service is being related to. It would be feasible (and probably welcome) to have the mysql service replaced by a mariadb service that provided a compatible mysql interface, and the wordpress charm wouldn’t have to be changed to communicate with it.

Also, although this example and many real world scenarios will have relations reflecting TCP connections, this may not always be the case. It’s reasonable to have relations conveying any kind of metadata across the related services.


Configuration

Service configuration follows the same model of metadata plus executable hooks that was described above for relations. A charm can declare what configuration settings it expects in its metadata, and how to react to setting changes in an executable hook named config-changed. Then, once a valid setting is changed for a service, all of the respective service units will have that hook called to reflect the new configuration.

Changing a service setting via the command line may be as simple as:

$ juju set wordpress title="My Blog"

This will communicate with the juju state server, record the new configuration, and consequently incite the service units to realize the new configuration as described. For clarity, this process may be represented as:

config-changed


Taking from here

This conceptual overview hopefully provides some insight into the original thinking that went into designing the juju project. For more in-depth information on any of the topics covered here, the following resources are good starting points:

Read more
niemeyer

10gen, the company behind the MongoDB database, recently announced the availability of the MongoDB Backup Service. This is not a traditional backup service, though. Rather than simply sending scheduled snapshots of the data over to a remote system, the backup service has an agent sitting next to the database that monitors its operation log, and streams the individual operations over to the remote backup servers. This model enables the service to offer some non-conventional features, such as restoring the state of the database at any point in the last 24h, in addition to more traditional snapshots over longer periods.

There’s another interesting fact about how the system was developed: the backup agent is also the first software 10gen releases that is written in the Go language. Reportedly, the agent started as a Java project but, as the project matured, the team wanted to move to a language that compiled to native machine code to make it easier to install. After considering a few options, the team decided that Go was the best fit for its C-like syntax, strong standard library, first-class concurrency, and painless multi-platform support.

I’ve invited Daniel Gottlieb, the main 10gen engineer behind the service agent, to provide some high-level feedback about the use of Go and mgo, the MongoDB driver, and he kindly replied:

Programming the backup agent in Go and the mgo driver has been extremely satisfying. Between the lightweight syntax, the first-class concurrency and the well documented, idiomatic libraries such as mgo, Go has become my language of choice for writing small scripts up to large distributed applications.

The mgo driver is a real pleasure to use. The code is of high quality, the documentation is thorough, clear and detailed, and the API is a thoughtful, natural blend of idiomatic Go and Mongo.

Those are encouraging words, Daniel. It’s great to see not only 10gen making good use of the Go language for first-class services, but contributing to that community of developers by providing its support for the development of the Go driver in multiple ways. Good chance to say thanks!

Read more
niemeyer

I’m glad to announce experimental support for multi-document transactions in the mgo driver that integrates MongoDB with the Go language. The support is done via a driver extension, so it works with any MongoDB release supported by the driver (>= 1.8).

Features

Here is a quick highlight list to get your brain ticking before the details:

  • Supports sharding
  • Operations may span multiple collections
  • Handles changes, inserts and removes
  • Supports pre-conditions
  • Self-healing
  • No additional locks or leases
  • Works with existing data

Let’s see what these actually mean and how the goodness is done.


The problem being addressed

The typical example is a bank transaction: imagine you have two documents representing accounts for different people, and you want to transfer 100 bucks from Aram to Ben. Despite the apparent simplicity in that description, there are a number of edge cases that turn it into a non-trivial change.

Imagine an agent processing the change following these steps:

  1. Is Ben’s account valid?
  2. Take 100 bucks out of Aram’s account if its balance is above 100
  3. Insert 100 bucks into Ben’s account

Note that this description already assumes the availability of some single-document atomic operations as supported by MongoDB. Even then, how many race conditions and crash-related problems can you count? Here are some spoilers that hint at the problem complexity:

  • What if Ben cancels his account after (1)?
  • What if the agent crashes after (2)?

How it works

Thanks to the availability of single-document atomic operations, it is be possible to craft a sequence of changes that manipulate documents in a way that supports multi-document transactional behavior. This works as long as the clients agree to use the same conventions.

This isn’t exactly news, though, and there’s even documentation describing how one can explore these ideas. The challenge is in crafting a generic mechanism that not only does the basics but goes beyond by supporting inserts and removes, being workload agnostic, behaving correctly on crashes (!), and yet remaining pleasant to use. That’s the territory being explored.

The implemented semantics offers an isolation level that allows non-repeatable reads to occur (a partially committed transaction is visible), but the changes are guaranteed to only be visible in the order specified in the transaction, and once any change is done the transaction is guaranteed to be applied completely without intervening changes in the affected documents (no dirty reads). Among other things, this means one can use any existing mechanism at read time.

When writing documents that are affected by the transaction mechanism, one must necessarily use the API of the new mgo/txn package, which ended up surprisingly thin and straightforward. In other words for emphasis: if you modify fields that are affected by the transaction mechanism both with and without mgo/txn, it will misbehave arbitrarily. Fields that are read or written by mgo/txn must only be changed using mgo/txn.

Using the example described above, the bank account transfer might be done as:

runner := txn.NewRunner(tcollection)
ops := []txn.Op{{
        C:      "accounts", 
        Id:     "aram",
        Assert: M{"balance": M{"$gte": 100}},
        Update: M{"$inc": M{"balance": -100}},
}, {
        C:      "accounts",
        Id:     "ben",
        Assert: M{"valid": true},
        Update: M{"$inc": M{"balance": 100}},
}}
id := bson.NewObjectId() // Optional
err := runner.Run(ops, id, nil)

The assert and update values are usual MongoDB querying and updating documents. The tcollection is a MongoDB collection that is used to atomically insert the transaction details into the database. As long as that document makes it into the database, the transaction is guaranteed to be eventually entirely applied or entirely aborted. The exact moment when this happens is defined by whether there are other transactions in progress and whether a communication problem occurs and when it occurs, as described below.

Concurrency and crash-proofness

Perhaps the most interesting piece of the puzzle when coming up with a nice transaction mechanism is defining what happens when an agent misbehaves, even more in a world where there are multiple distributed transaction runners. If there are locks, someone must unlock when a runner crashes, and must know the difference between running slowly and crashing. If there are leases, the lease boundary becomes an issue. In both cases, the speed of the overall system would become bounded by the speed of the slowest runner.

Instead of falling onto those issues, the implemented mechanism observes the transactions being attempted on the affected documents, orders them in a globally agreed way, and pushes all of their operations concurrently.

To illustrate the behavior, imagine again the described scenario of bank transferences:

In this diagram there are two transactions being attempted, T1 and T2. The first is a transference from Aram to Ben, and the second is a transference from Ben to Carl. If a runner starts executing T2 while T1 is still being applied by a different runner, the first runner will pick T1 up and complete it before starting to work on T2 which is its real goal. This works even if the original runner of T1 died while it was in progress. In reality, there’s little difference between the original runner of T1 and another runner that observes T1 on its way.

There’s a chance that T1′s runner died too soon, though, and it hasn’t had a chance to even start the transaction by tagging Ben’s account document as participating in it. In that case, T2 will be pushed forward by its own runner independently, since there’s nothing on its way. T1 isn’t lost, though, and it may be resumed at any point by calling the runner’s Resume or ResumeAll methods.

The whole logic is implemented without introducing any new globally shared point of coordination. It works if documents are in different collections, different shards, and it works even if the transaction collection itself is sharded across multiple backends for scalability purposes.

The testing approach

While a lot of thinking was put onto the way the mechanism works, this is of course non-trivial and bug-inviting logic. In an attempt to nail down bugs early on, a testing environment was put in place to simulate multiple runners in a conflicting workload. To make matters more realistic, this simulation happens in a harsh scenario with faults and artificial slowdowns being randomly injected into the system. At the end, the result is evaluated to see if the changes performed respected the invariants established.

While hundreds of thousands of transactions have been successfully run in this fashion, the package should still be considered experimental at this point, and its API is still prone to change.

There’s one race

There’s one known race that’s worth mentioning, and it was consciously left there for the moment as a tradeoff. The race shows itself when inserting a new document, at the point in time when the decision has been made that the insert was genuinely good. At this exact moment, if that runner is frozen for long enough that would allow for a different runner to insert the document and remove it again, and then the original runner is unfrozen without any errors or timeouts, it will naturally go on and insert the new document.

There are multiple solutions for this problem, but they present their own disadvantages. One solution would be to manipulate the document instead of removing it, but that would leave the collection with ghost content that has to be cared for, and that’s an unwanted side effect. A second solution would be to use the internal applyOps machinery that MongoDB uses in its sharding implementation, but that would mean that collections affected by transactions couldn’t be sharded, which is another unwanted side effect (please vote for SERVER-1439 so we can use it).

Have fun!

I hope the package serves you well, and if you would like to talk further about it, please join the mgo-users mailing list and drop a message.

Read more