Canonical Voices

Posts tagged with 'security'

niemeyer

In an effort to polish the recently released draft of the strepr v1 specification, I’ve spent the last couple of days in a Go reference implementation.

The implemented algorithm is relatively simple, efficient, and consumes a conservative amount of memory. The aspect of it that deserved the most attention is the efficient encoding of a float number when it carries an integer value, as covered before. The provided tests are a useful reference as well.

The API offered by the implemented package is minimal, and matches existing conventions. For example, this simple snippet will generate a hash for the stable representation of the provided value:

value := map[string]interface{}{"a": 1, "b": []int{2, 3}}
hash := sha1.New()
strepr.NewEncoder(hash).Encode(value)
fmt.Printf("%x\n", hash.Sum(nil))
// Outputs: 29a77d09441528e02a27dc498d0a757da06250a0

Along with the reference implementation comes a simple command line tool to play with the concept. It allows easily arriving at the same result obtained above by processing a JSON value instead:

$ echo '{"a": 1.0, "b": [2, 3]}' | ./strepr -in-json -out-sha1
29a77d09441528e02a27dc498d0a757da06250a0

Or YAML:

$ cat | ./strepr -in-yaml -out-sha1                 
a: 1
b:
   - 2
   - 3
29a77d09441528e02a27dc498d0a757da06250a0

Or even BSON, the binary format used by MongoDB:

$ bsondump dump.bson
{ "a" : 1, "b" : [ 2, 3 ] }
1 objects found
$ cat dump.bson | ./strepr -in-bson -out-sha1
29a77d09441528e02a27dc498d0a757da06250a0

In all of those cases the hash obtained is the same, despite the fact that the processed values were typed differently in some occasions. For example, due to its Javascript background, some JSON libraries may unmarshal numbers as binary floating point values, while others distinguish the value based on the formatting used. The strepr algorithm flattens out that distinction so that different platforms can easily agree on a common result.

To visualize (or debug) the stable representation defined by strepr, the reference implementation has a debug dump facility which is also exposed in the command line tool:

$ echo '{"a": 1.0, "b": [2, 3]}' | ./strepr -in-json -out-debug
map with 2 pairs (0x6d02):
   string of 1 byte (0x7301) "a" (0x61)
    => uint 1 (0x7001)
   string of 1 byte (0x7301) "b" (0x62)
    => list with 2 items (0x6c02):
          - uint 2 (0x7002)
          - uint 3 (0x7003)

Assuming a Go compiler and the go tool are available, the command line strepr tool may be installed with:

$ go get launchpad.net/strepr/cmd/strepr

As a result of the reference implementation work, a few clarifications and improvements were made to the specification:

  • Enforce the use of UTF-8 for Unicode strings and explain why normalization is being left out.
  • Enforce a single NaN representation for floats.
  • Explain that map key uniqueness refers to the representation.
  • Don’t claim the specification is easy to implement; floats require attention.
  • Mention reference implementation.

Read more
niemeyer

Note: This is a candidate version of the specification. This note will be removed once v1 is closed, and any changes will be described at the end. Please get in touch if you’re implementing it.

Contents


Introduction

This specification defines strepr, a stable representation that enables computing hashes and cryptographic signatures out of a defined set of composite values that is commonly found across a number of languages and applications.

Although the defined representation is a serialization format, it isn’t meant to be used as a traditional one. It may not be seen entirely in memory at once, or written to disk, or sent across the network. Its role is specifically in aiding the generation of hashes and signatures for values that are serialized via other means (JSON, BSON, YAML, HTTP headers or query parameters, configuration files, etc).

The format is designed with the following principles in mind:

Understandable — The representation must be easy to understand to increase the chances of it being implemented correctly.

Portable — The defined logic works properly when the data is being transferred across different platforms and implementations, independently from the choice of protocol and serialization implementation.

Unambiguous — As a natural requirement for producing stable hashes, there is a single way to process any supported value being held in the native form of the host language.

Meaning-oriented — The stable representation holds the meaning of the data being transferred, not its type. For example, the number 7 must be represented in the same way whether it’s being held in a float64 or in an uint16.


Supported values

The following values are supported:

  • nil: the nil/null/none singleton
  • bool: the true and false singletons
  • string: raw sequence of bytes
  • integers: positive, zero, and negative integer numbers
  • floats: IEEE754 binary floating point numbers
  • list: sequence of values
  • map: associative value→value pairs


Representation

nil = 'z'

The nil/null/none singleton is represented by the single byte 'z' (0x7a).

bool = 't' / 'f'

The true and false singletons are represented by the bytes 't' (0x74) and 'f' (0x66), respectively.

unsigned integer = 'p' <value>

Positive and zero integers are represented by the byte 'p' (0x70) followed by the variable-length encoding of the number.

For example, the number 131 is always represented as {0x70, 0x81, 0x03}, independently from the type that holds it in the host language.

negative integer = 'n' <absolute value>

Negative integers are represented by the byte 'n' (0x6e) followed by the variable-length encoding of the absolute value of the number.

For example, the number -131 is always represented as {0x6e, 0x81, 0x03}, independently from the type that holds it in the host language.

string = 's' <num bytes> <bytes>

Strings are represented by the byte 's' (0x73) followed by the variable-length encoding of the number of bytes in the string, followed by the specified number of raw bytes. If the string holds a list of Unicode code points, the raw bytes must contain their UTF-8 encoding.

For example, the string hi is represented as {0x73, 0x02, 'h', 'i'}

Due to the complexity involved in Unicode normalization, it is not required for the implementation of this specification. Consequently, Unicode strings that if normalized would be equal may have different stable representations.

binary float = 'd' <binary64>

32-bit or 64-bit IEEE754 binary floating point numbers that are not holding integers are represented by the byte 'd' (0x64) followed by the big-endian 64-bit IEEE754 binary floating point encoding of the number.

There are two exceptions to that rule:

1. If the floating point value is holding a NaN, it must necessarily be encoded by the following sequence of bytes: {0x64, 0x7f, 0xf8, 0x00 0x00, 0x00, 0x00, 0x00, 0x00}. This ensures all NaN values have a single representation.

2. If the floating point value is holding an integer number it must instead be encoded as an unsigned or negative integer, as appropriate. Floating point values that hold integer numbers are defined as those where floor(v) == v && abs(v) != ∞.

For example, the value 1.1 is represented as {0x64, 0x3f, 0xf1, 0x99, 0x99, 0x99, 0x99, 0x99, 0x9a}, but the value 1.0 is represented as {0x70, 0x01}, and -0.0 is represented as {0x70, 0x00}.

This distinction means all supported numbers have a single representation, independently from the data type used by the host language and serialization format.

list = 'l' <num items> [<item> ...]

Lists of values are represented by the byte 'l' (0x6c), followed by the variable-length encoding of the number of pairs in the list, followed by the stable representation of each item in the list in the original order.

For example, the value [131, -131] is represented as {0x6c, 0x70, 0x81, 0x03, 0x6e, 0x81, 0x03, 0x65}

map = 'm' <num pairs> [<item key> <item value>  ...]

Associative maps of values are represented by the byte 'm' (0x6d) followed by the variable-length encoding of the number of pairs in the map, followed by an ordered sequence of the stable representation of each key and value in the map. The pairs must be sorted so that the stable representation of the keys is in ascending lexicographical order. A map must not have multiple keys with the same representation.

For example, the map {"a": 4, 5: "b"} is always represented as {0x6d, 0x02, 0x70, 0x05, 0x73, 0x01, 'b', 0x73, 0x01, 'a', 0x70, 0x04}.


Variable-length encoding

Integers are variable-length encoded so that they can be represented in short space and with unbounded size. In an encoded number, the last byte holds the 7 least significant bits of the unsigned value, and zero as the eight bit. If there are remaining non-zero bits, the previous byte holds the next 7 bits, and the eight bit is set on to flag the continuation to the next byte. The process continues until there are non-zero bits remaining. The most significant bits end up in the first byte of the encoded value, which must necessarily not be 0x80.

For example, the number 128 is variable-length encoded as {0x81, 0x00}.


Reference implementation

A reference implementation is available, including a test suite which should be considered when implementing the specification.


Changes

draft1 → draft2

  • Enforce the use of UTF-8 for Unicode strings and explain why normalization is being left out.
  • Enforce a single NaN representation for floats.
  • Explain that map key uniqueness refers to the representation.
  • Don’t claim the specification is easy to implement; floats require attention.
  • Mention reference implementation.

Read more
niemeyer

Today ubuntufinder.com was updated with the latest image data for Ubuntu 13.04 and all the previous releases as well. Rather than simply hardcoding the values again, though, the JavaScript code was changed so that it imports the new JSON-based feeds that Canonical has been publishing for the official Ubuntu images that are available in EC2, thanks to recent work by Scott Moser. This means the site is always up-to-date, with no manual actions.

Although the new feeds made that quite straightforward, there was a small detail to sort out: the Ubuntu Finder is visually dynamic, but it is actually a fully static web site served from S3, and the JSON feeds are served from the Canonical servers. This means the same-origin policy won’t allow that kind of cross-domain import to be easily done without further action.

The typical workaround for this kind of situation is to put a tiny proxy within the site server to load the JSON and dispatch to the browser from the same origin. Unfortunately, this isn’t an option in this case because there’s no custom server backing the data. There’s a similar option that actually works, though: deploying that tiny proxy server in some other corner and forward the JSON payload as JSONP or with cross-origin resource sharing enabled, so that browsers can bypass the same-origin restriction, and that’s what was done.

Rather than once again doing a special tiny server for that one service, though, this time around a slightly more general tool has emerged, and as an experiment it has been put live so anyone can use it. The server logic is pretty simple, and the idea is even simpler. Using the services from jsontest.com as an example, the following URL will serve a JSON document that can only be loaded from a page that is in a location allowed by the same-origin policy:

If one wanted to load that page from a different location, it might be transformed into a JSONP document by loading it from:

Alternatively, modern browsers that support the cross-origin resource sharing can simply load pure JSON by omitting the jsonpeercb parameter. The jsonpeer server will emit the proper header to allow the browser to load it:

This service is backed by a tiny Go server that lives in App Engine so it’s fast, secure (hopefully), and maintenance-less.

Some further details about the service:

  • Results are JSON with cross-origin resource sharing by default
  • With a query parameter jsonpeercb=<callback name>, results are JSONP
  • The callback name must consist of characters in the set [_.a-zA-Z0-9]
  • Query parameters provided to jsonpeer are used when doing the upstream request
  • HTTP headers are discarded in both directions
  • Results are cached for 5 minutes on memcache before being re-fetched
  • Upstream results must be valid JSON
  • Upstream results must have Content-Type application/json or text/plain
  • Upstream results must be under 500kb
  • Both http and https work; just tweak the URL and the path accordingly

Have fun if you need it, and please get in touch before abusing it.

UPDATE: The service and blog post were tweaked so that it defaults to returning plain JSON with CORS enabled, thanks to a suggestion by James Henstridge.

Read more
Gustavo Niemeyer

In the last post, we’ve seen some security issues which exist in the Android password manager gbaSafe version 1.1.0a, by analyzing the security description provided in its web site. As described there, even though the system depends on a “master key” which might be secure, the security of the system is seriously compromised by the encouragement of very weak keys (a few digits only) in what is named an “unlock key”, used to encrypt the master key itself. All of that in an application which claims to strongly protect people’s data from unwanted eyes.

In this post, we will play a bit with the Linux-based Android OS to actually explore these security deficiencies, demonstrating that such issues are very real, and that the claims of being hard to unveil the data is unfounded. Since the most serious weakness lies in the key itself, we’ll run a simple brute force attack to try to find arbitrary unlock keys.

This procedure is actually mentioned by the author of gbaSafe himself in the web page, except he overestimates the work involved in producing such a mechanism:

Theoretically, somebody could write a program that tries to decrypt the master key by trying all possible values of the short key (with 4 digits there are only 10000 possibilities), but this would still be much work, as more information about the crypting algorithm is needed (e.g. salt bytes, iteration count).

So let’s get started.

As a first step, we’ll need the Android SDK with a working emulator (I’ve used API level 5, revision 1), and a copy of the application itself. I got a trial version of the application from AndAppStore.com.

The application downloaded is bundled within that .apk file, which is really a .zip file that may be opened up normally with any tool which understands this file format.

Once that’s done, we get access to all the information needed to run the application, including icons, interface layouts, and most importantly in this case, the bytecode which targets the Dalvik VM. This bytecode is the end result of a sequence of translations which happen when the program’s Java source code is compiled, so that’s what we’ll have to fiddle with to figure details of the application we want to investigate.

The bytecode is located inside the classes.dex file, and as expected it’s not easy to read in its native format. Luckily, though, a smart guy has already written a couple of tools, smali and baksmali, which allow people to decompile and recompile that bytecode format to/from something that is much easier to understand.

After downloading these tools, the following command should decompile the file:

$ java -jar baksmali.jar –output classes classes.dex

We now have a classes/ directory full of .smali files.

Before going any further, let’s ponder for a moment about what we want to do. A brute force attack is when we attempt sequentially many possible keys, and given the context already presented, what we’re looking after is to attempt different “unlock keys”. With that in mind, we’ll introduce a very small modification in the application so that it will attempt to enter the unlock key automatically, rather than reporting an error when the key entered in the unlock dialog is invalid.

With that in mind, after some quick research, it looks like the onClick() method within the DlgUnlock.smali file is a pretty good candidate. This is supposedly called when the button in the unlock dialog is clicked, so something interesting about the password being correct or not must happen there.

Before doing anything there, I’ve increased the number of registers in the function to 12, to get some additional registers to play with, and then initialized a register with the value zero, to serve as a monotonically increasing number (our keys!):

.method public onClick(Landroid/view/View;)V
- .registers 9
+ .registers 12
.parameter “view”
+ const/16 v9, 0×0

Then, we have to instruct the program to use these keys rather than whatever is typed in the dialog box. Some lines down, we’ll see a call to the checkUnlockKey() method, which is certainly what we’re looking after. Let’s do this now:

+ :mykey
+ invoke-static {v9}, Ljava/lang/String;->valueOf(I)Ljava/lang/String;
+ move-result-object v2
invoke-static {v2}, Lcom/gbizapps/safeA/Crypt;->checkUnlockKey(Ljava/lang/String;)I

Now, what if this key is wrong? We don’t want the master key to be removed as mentioned in the software description. We want to simply attempt the next key. With some analysis, we see that in case of errors, the next couple of lines below the above code will instruct the VM to jump to an error branch. Rather than following up with the normal error logic, we’ll increment the key, and jump back to the above code:

:cond_6c
+ add-int/lit8 v9, v9, 0×1
+ goto :mykey

Now we just have to rebundle this and put it into the emulator. I won’t go over it in too much detail here, since there’s plenty of information available online, but the steps to do that are:

  1. Recreate a modified classes.dex with smali
  2. Recreate a modified .apk file by just zipping the modified content
  3. Sign and zipalign the new .apk file
  4. Install it

And that’s it, seriously! This would be enough to break the software security if it was working correctly.

Interestingly, though, the software wasn’t working correctly with this change. Instead, it was Force Closing on certain keys. To test it out, use the master key “master key”, and the unlock key “999999″, and then once you close and open the application again, try to unlock it with the key “1175″. Instead of showing an error message, it will break badly.

Now, for the proof of concept to work, I actually had to fix the bug, which felt a bit funny to do given the context.

Looking at the traceback trough adb logcat, I found out that there was a null being dereferenced in the file Crypt.smali, so I fixed the problem by injecting some error checking at this position and jumping the flow into an existing error branch:

+ if-eqz v3, :cond_5a
const-string v4, “ucpmhkexov85MDKhdfdfFGQPYxywq7209fcrqhghjkuiopy”

With this in place came the biggest surprise of the experiment. The keys which were crashing the application were special, in the sense that they actually decode the master key successfully! That’s right: whatever the algorithm is doing, that six-digit “999999″ encrypts the master key in such a way that attempting the “1175″ key works, so even big keys are rendered extremely weak with the logic used to encrypt the master key.

At this point, I added some trivial logic to display the key found with a Toast, just to ensure the whole thing was working correctly:

Toast displaying unlock key found

Note that the key generation implemented above is a bit simplistic, in the sense that it doesn’t attempt keys with leading zeros, but this would be trivial to implement, and my intention here isn’t to actually break any keys for real, but just to show how the promised security in this application is not to be trusted at all. Just the logic above will already be enough for a brute force attack against the application, and has broken all the keys I’ve tried in mere seconds, in a slow emulator.

As a conclusion, if you want to put your data in a secure place, rather than picking an application which promises security because the salt is hidden somewhere or because it’s too much work to figure its logic, pick an open source application with logic which is publicly verifiable and has already had many eyes over it. Chances are that doing something like what was described in this post won’t be so trivial. Then, choose your keys wisely! The most secure application won’t be enough if you pick a bad key.

Read more
Gustavo Niemeyer

For some time now I’ve been wanting to research more deeply about the internals of Android. Until now, though, this was just a sentiment. Then, a couple of weeks ago I’ve finally managed to replace my iPhone for an Android phone, and that was the final motivator for me to actually get into learning more about the inner workings of the Linux-based OS.

Now, I just had to pick an actual task for digging into. The Dalvik VM is certainly one of the most innovative and advertised technical details about the OS, so something around it would be a nice start.. some kind of bytecode fiddling perhaps, but what? Luckily, even without trying too hard, I eventually stumbled upon an interesting case for researching upon.

The “victim” of this research is the application gbaSafe version 1.1.0a, which claims to protect user passwords using unbreakable algorithms (how’s that for a hint of a Snake oil case?).

Before we get into some hacking, let’s see some words on the software security by the author himself, and then render some analysis on conceptual issues on it:

The confidential data can only be decrypted if the master key is known. You should choose a long key (at least 16 characters) with mixed case and unreadable text. Of course you cannot enter this key each time you want to access the confidential data, so it is stored in the user settings encrypted with a shorter key (4 to 6 digits) and normally you only have to enter this unlock key. Theoretically it is possible to try all possible values (brute force attack), but then you must use another program, since gbaSafe deletes the encrypted master key from the user settings when you enter the unlock key wrong three times repeatedly, and then you must enter the master key. If you wrote a program to decrypt the master key, you would have to know the algorithm used, the salt bytes and iteration count (used to augment the short unlock key), which are very hard to extract from the binary program module gbaSafe.

If you have some security background, I’m sure that by now you’re already counting the issues on this single paragraph.

The most obvious issue is the fact that there’s a “strong key” and a “weak key”, and the strong key is encrypted with the weak one. This is a very common cryptography sin, as would say my friend and coworker Andreas Hasenack (a security researcher himself). A security system is only as secure as its weakest spot. It obviously makes little difference for an attacker if he has to attempt decrypting a master key or the actual data, since decrypting the master key will give access to the data.

Then, it mentions en passant that the software enforces the use of digits for the weak key. This ensures that the weak key is really weak! Four digits is basically ten thousand attempts, which is absolutely nothing for nowadays’s hardware. This number would move up to about 15 million by simply allowing upper and lowercase letters as well (which isn’t great either, but a few orders of magnitude never hurt in this scenario).

It follows up encouraging people to think that it’s actually hard to figure the algorithm and other implementation details. Considering that there’s absolutely nothing preventing people from getting their hands in the implementation itself, this is in fact asserting that the security mechanism is based on the ignorance of the attacker. Counting on the ignorance of people is bad at all times, and in a security context it’s a major error.

There’s a final security issue in this description which is a bit more subtle, but further analysis on the logic used leaves no doubt. In cryptography, the salt is supposed to increase the work needed in a brute force attack by strengthening the number of bits of the actual passphrase, in a case where the salt is actually unavailable, or at least prevent that a single large word dictionary can be used to attack several encryptions or hashes at once, in a case where the salt is known but variable. In the latter case, it helps because encrypting a single key with two different salts must be done twice, rather than once, so it increases the computational task when attacking multiple items. A salt which is known and does not change across all processed items is worth pretty close to nothing.

So, indeed, considering the many security issues here, this isn’t something I’d store my passwords or credit card numbers on, and I suggest you don’t do it either.

In my next post on this topic I’ll actually implement a trivial brute force attack to prove that these issues are very real, and that, actually, it’s not even hard to break into a security system like this.

The application author has been contacted about this blog post, since he’ll likely want to fix some of these issues.

Read more