PHP MD5 – Digital Fingerprints

php md5The PHP MD5 function is an algorithm that takes a message of arbitrary length as input and returns a 128 bit fingerprint or message as a result. MD5 is a hash cryptographic technology and can have potentially many usages. Hash algorithms can be used to compress and encrypt data, and can be used to validate data integrity. Developers can use the PHP MD5 string function when a file, message or block of data needs to be compressed and encrypted. The process result is a “MessageDigest” string, which serves as a digital signature. These signatures are common in secure data transmission applications utilizing private key encryption.

Need to secure your applications? Get a primer on PHP and encryption at Udemy.com

So what is MD5?

MD5 stands for “message digest algorithm 5.” It was invented back in the early nineties by Professor Ronald Rivest as a successor to the MD4 algorithm. It is a form of cryptographic hashing where the algorithm takes an arbitrary length of data and returns it as a fixed-size hash value. This means that the message to be hashed can be any size or length but the result will always be a fixed length.

Using MD5 in PHP

The PHP MD5 function is available from PHP4 onwards. Its syntax is

md5(string,raw)

String is required, raw is optional and if true = 16 bit decimal format or false = 32 hex format (default)

PHP (MD5) sample code

<?php
$str = "Hello";
echo "The string: ".$str."<br>";
echo "TRUE - Raw 16 character binary format: ".md5($str, TRUE)."<br>";
echo "FALSE - 32 character hex number: ".md5($str)."<br>";
?>

The result

The string: Hello
FALSE – 32 character hex number: 8b1a9953c4611296a827abf8c47804d7

Get started with PHP and its functionality at Udemy.com

Hashing Algorithms – General Usage

One purpose behind any hashing algorithm, and there are many, is to produce a digital signature or fingerprint to validate the integrity of the message. For this reason, Digital Signatures are commonly used to validate the integrity of data being transmitted over an insecure medium such as the Internet. A received hash value that does not match the sender’s initial value clearly indicates to the recipient that the data has been tampered with or corrupted en route.

A practical example of using hash values is when you download large files from the Internet such as a Linux distribution. The advertised file will have a hash value assigned to it. Once the download is completed, the recipient can compare the received hash value to the advertised value, in order to validate that the files are identical. For example, peer-to-peer file-sharing applications use hash values to identify files spread over their networks. An identical file will always have a matching hash value. Therefore, hashing algorithms can be used to search and match for identical hash values. In the case of file-sharing applications, hash values are used to identify an identical file, not by its ambiguous title or misspelled title but by the file’s hash value.

Another use for hashing algorithms is to validate the integrity of mail messages. For example to ensure that, a man-in-the-middle, has not intercepted and tampered with the message. Another common use is to store passwords on computers in a hashed format. When users enter their password, the software hashes the input value then compares the result with the stored hash value. One point of interest is that hashing is a one-way encryption scheme. You cannot derive a password from a hash unless the hash is compared with dictionary brute force “guesses.”

Is MD5 a Strong Cryptographic Algorithm?

MD5 is a broken cryptographic algorithm, and the PHP MD5 function should be used with caution. MD5 was initially discovered to have potential flaws back in the mid-nineties, though these were deemed insignificant. However, in 2004 researchers were able to create identical hash values from two different file sources. This is termed a collision attack where attackers look for two existing and different files with the same hash value. The US Department of Homeland Security announced that: “users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use“.

So why is MD5 still in common use?

MD5 is still used as a means to validate that messages or files are identical. MD5 is still used for verification, but it’s not appropriate for extremely secure requirements. For example, one of the suggested uses for hashing algorithms was hashing stored passwords. This has been replaced by encryption schemes such as Blowfish.  PHP developers advise not to use PHP MD5 for encrypting passwords, though they point to the speed of the algorithm being its weakness. MD5 is a fast algorithm, but it’s not what makes it inherently unsuitable for storing passwords. MD5 is simply a broken cryptographic algorithm that is easily overcome using modern computing power and brute force techniques.

Another reason that MD5 persists in use today is that some environments do not require high security. These environments simply need to stop casual snooping such as looking over the shoulder at an administrator’s display. In that case, hashing a password using MD5 does mitigate the risks and serves a purpose. However, there are better ways to achieve this and higher levels of security at the same time. Unfortunately, MD5, still persists in usage due to ignorance and the fact that its severe flaws are not common knowledge and have not been withdrawn from service.

MD5 still serves a purpose but every PHP developer must be aware that it should not be a website’s primary source for security.

To learn PHP and coding for secure environments, take a course at Udemy.com