Wikileaks Pre-commitment Hashes: What Are They?

There has been a lot of chatter on Reddit and other forums in the last 24 hours about "pre-commitment" and "pre-commitment hashes." The chatter focuses on what Internet users have determined to be "wrong" hashes for Wikileaks publications. But what are these "pre-commitment hashes?" And why is it a problem if they're wrong?

Wikileaks Pre-commitment Hashes and Wikileaks: A Primer

According to Wikipedia, a pre-commitment is "Precommitment is a strategy in which a party to a conflict uses a commitment device to strengthen its position by cutting off some of its options to make its threats more credible."

What does that mean in the case of Wikileaks? Wikileaks has provided its entire database of material that hasn't yet been released in encrypted form. Anyone can download that database, but you can't see what's in it unless you have the encryption key (essentially, a password). The idea is that if something happens to Julian Assange or Wikileaks, the key (or password) would be released into the wild, and all of the material would be released.

This serves as a deterrent to doing something to Assange or Wikileaks, since anyone who may do something knows that he is automatically releasing the information.

Hashes, meanwhile, are something used to check the authenticity of data. So the computer looks at a piece of data and calculates a large number that's unique to that data. If someone else gets the data, he can calculate the same hash and determine if the data has been changed. If the data has been changed, the hashes won't be the same. If the data has not been changed, the hashes will match.

The "pre-commitment hashes" that Wikileaks releases are hashes of its entire dataset, so that someone can make sure that nobody has maliciously changed the data.

Mismatched Wikileaks Pre-commitment Hashes - And Wikileaks' Response

These became a big concern in the past 24 hours because users on Reddit and other forums were calculating that hash on the pre-commitment files that Wikileaks released. And those hashes didn't match the ones that Wikileaks had publicized.

This was an issue because people thought that someone had been able to get into the Wikileaks dataset and make changes. Several news outlets wrote about it too.

However, Wikileaks soon explained that there was nothing to be concerned about. The hash makes much more sense to be performed on the decrypted data. That is, Wikileaks calculated the hash for the dataset, which it published. Then it encrypted and posted the dataset. Because people were trying to calculate the hash of the encrypted dataset, the hashes would not match.

But what Wikileaks explained was that they had calculated the hash before encrypting, and people should only check the hash after they decrypt the dataset. This makes sense because you can then ensure that the data hasn't been changed by the decryption/encryption functions either.

So the entire brouhaha turned out to be a non-issue.