Integrity: Hashing It Out

One of the first chapters in learning Security+ in dealing with information security is the CIA triad. This isn’t about the Central Intelligence Agency but instead it deals with 3 terms which are:

  • Confidentiality
  • Integrity
  • Availability

When it comes to data and protecting information we need to be sure these 3 items can be satisfied to provide confidence to stakeholders that their data will be protected, available when needed and that the information can be trusted. If you were to log onto a site to play your favorite game you want to be able to log in whenever you like (Availability), you want to feel comfort in the fact no one else has access to your data and log in as you (Confidentiality) and you certainly don’t want to log in and find out half the points or awards you accumulated in the last week is now gone which could mean it has been tampered with (Integrity).

Hashes are one concept that involves 2 parts of the CIA triad which are confidentiality and integrity. So what is hashing really? The general idea is to take some data and pass it through an algorithm to create a jumble of characters and numbers to hide the actual information which offers confidentiality. The second idea involving hashes is to take the contents of a file and pass it through an algorithm to once again create a jumble of characters and numbers. The difference here is this can be run again at a later time to see if the contents of a file has changed. If there has been changes the hash will appear different. This brings us integrity.

For a short demonstration on how hashing can be applied in these situations let’s first take a look at an scenario where we create username and a password. The username may or may not be hashed but the password will most likely be. So let’s say we want to use the password “RedWolf123” for an example. If we run this through an MD5 hash as shown below the result will come out with a bunch of letters and numbers.

In the image below we can see how much of a difference the hash will be if we remove one character of this password to make it “RedWolf12”.

Now to make this application work for someone to log in the database would simply take the password the user enters in a login form, run it through a hashing algorithm such as MD5 and see if it matches the hash that is saved in the database. If there is a match it lets the user log in and if not it throws out an error message such as “Username and/or Password not correct”. I will discuss this in deeper detail in another article but for the sake of this article we will talk a bit on the integrity concept that is also applied to security.

When a file is created we can use hashes to create what is also known as a checksum. This can be checked at a later time or after the file has been downloaded from a remote location to see if the hash has changed. If the hash has changed then the file itself has been altered at some point. This can be an indication of tampering and the if it is an executable file then it should be avoided or deleted altogether. Someone with malicious intent can alter the code or add new code which changes the file hash. To demonstrate this in an example we will create a simple file called note.txt with some content inside such as shown below:

Once we save this file we can then create a checksum using the md5sum command in Linux as shown below.

Again the hash shows a jumble of characters and numbers. If the file content or metadata is altered in any way the jumble of characters will change. To show that even a small difference such as adding a space in the original file the hash will change. Notice the placement of the cursor in the image below and compare where it is in the similar image above.

This will then create a new checksum using the md5sum command again as shown below:

As shown here the first 3 characters in the new hash start with 536… Even a minor change to the file changes the hash into a completely different value. This is the basis of file integrity. This is useful for many types of applications such as creating a new program to PDF files on financial data. It becomes easier to be able to trust the integrity of the file.

But wait, MD5 is considered insecure…

There are other hashing algorithms we can use such as SHA-1 or SHA-2 which can be considered more secure. One criticism of MD5 is it can fall victim of a collision attack. A collision attack is when 2 inputs create the same hash value. An example would be one password creates a hash value and a completely different password creates the same hash value which theoretically means that password would work too. I chose MD5 here for a few reasons such as historical perspective since this is an earlier concept and also since SHA-1 and SHA-2 have longer hashes. It is somewhat easier to grasp smaller numbers when learning a new concept. However to show the difference of using SHA-2 compared to MD5 I have provided the image below.

In conclusion, hashes can be a very useful tool in security. So the next time you plan to download a file there may be a checksum such as MD5 or SHA-1/SHA-2 hash associated with it. This verifies the integrity of the file and provides confidence that the file is genuine. As this is more of the theory section of the topic in later articles I will provide how you can apply this principle to verify these files on your own and also the nitty gritty of its use with passwords. In the meantime it doesn’t hurt to do some research of your own on MD5 and SHA hashes in cryptography. Don’t stress too much if it all seems fuzzy to you. Teaching yourself such topics is similar to working out as the more you do it the easier it becomes.