Hash functions are a fundamental part of blockchain technologies. If you understand hash functions, it will make understanding other concepts such as tamper proofing, digital fingerprints and provenance easier.
What is a hash function?
The hash concept is actually quite simple. It’s the amount of jargon used that confuses people. Simply stated, a hash function takes some input data and creates some output data.
To expand on this concept, a hash function takes an input of any length and creates an output of fixed length .
Here is an example using a type of hash function called md5:
It takes an input string and created a string of random letters and numbers “a0680c04c4eb53884be77b4e10677f2b”. This is referred to as the message digest. It is also known as the digital fingerprint. This is because there is no way this digest can represent any other string. If I try and modify this to “I owe my sister $2” the message digest will be completely different.
What are the types of hash functions?
There are lots of different types of hash functions. You can wrap your head around them here. The main ones involving the blockchain are SHA256 and RIPEMD. The number such as 128 or 256 generally refers to the length of the output. ie SHA256 will produce a 256 bit output.
Above is the SHA256 command run on Linux. The output is 256 bits or 64 characters long. Count it if you don’t believe me!
The easiest way to detect if the input has changed is to compare the message digest of 2 proclaimed versions. If they match, you can be sure that the person holding the mortgage title for example is indeed the true owner of the house.
Many people ask how can it be possible to never come across the same message digest? It can’t be infinitely unique can it?
The answer is that it is NOT infinitely unique but the secret sauce is that it would take something like all the computers since the beginning of time a billion years to find a collision. ie two different inputs resulting in the same hash output. And that is good enough.
One way street
Another property of hash functions are they are one way. It is really easy to calculate a message digest but given the digest, it is near impossible to figure out in the input. Again, not impossible but it will take another billion years or so.
Another way to think of hash functions is compression. A large input is essentially compressed into a very short string representation of that input. I can then use that digest or summary to help detect if the input has changed down the track.
How does this relate to blockchains?
Blockchains make use of hash functions everywhere. Data on the blockchain is “hashed” in each block. If the block is changed, ie someone tried to change how many bitcoins they owned or how much they owed their sister, the hashed value would be different and everyone could detect that something had changed.
The hashed value of the previous block is used to calculate the hashed value of the current block creating this link between the blocks.
Not many people will talk about hash functions but lots will talk about provenance. That is, a record of where something came from such as organic wheat or a bolt used in making a jumbo jet and being able to track this on the blockchain because it is immutable. It’s immutable because if there is a change, it will be detected and rejected by the other nodes and hash functions play a big part.