What Is a Hash? SHA-256, MD5, and Checksums Explained
Learn what hash functions do, when to use SHA-256 instead of MD5, and how to verify checksums without treating cryptography like black magic.
The one-way fingerprint machine
Here’s an analogy I keep coming back to: imagine a machine that takes any object — a marble, a car, a blue whale — and spits out a fixed-length serial number. The serial number is always exactly 64 characters. It’s unique to that object (in practice). And here’s the crucial part: given only the serial number, there is absolutely no way to reconstruct the original object.
That’s a hash function. It takes any input (a password, a file, an entire database dump) and produces a fixed-length output called a hash, a digest, or a checksum. The same input always produces the same output. Change a single bit of the input — one letter, one pixel, one flipped zero — and the output changes completely. And you cannot reverse the process to recover the original input.
If you’ve been writing code for any length of time, you’ve already used hashing, whether you realised it or not. Every git commit is identified by a SHA-1 hash. Every password you’ve saved properly is stored as a hash. Every ISO you’ve downloaded with a checksum on the download page was verified using a hash. It’s one of those foundational concepts that quietly holds half the internet together.
I learned this the dull but useful startup way: nothing feels more “we are all professionals here” than a release checklist that ends with somebody pasting matching checksums into chat so everyone can stop panicking about whether the build artefact changed under their feet.
How a hash function works (without the maths degree)
At a high level, a hash function does three things:
- Takes variable-length input. Your input can be a single character or a 4 GB video file. Doesn’t matter.
- Produces fixed-length output. MD5 always produces 128 bits (32 hex characters). SHA-256 always produces 256 bits (64 hex characters). The output length never changes regardless of input size.
- Is deterministic. The same input always yields the same output. Hash “hello” with SHA-256 today and again next year — identical result, every time.
Under the hood, hash algorithms use a series of bitwise operations, modular arithmetic, and compression functions to thoroughly scramble the input. The specific operations differ between algorithms, but the design goal is always the same: make the output look completely random and make it computationally infeasible to reverse.
A good cryptographic hash function has three essential properties:
- Pre-image resistance — given a hash output, you can’t work backwards to find the input. This is the “one-way” property.
- Collision resistance — it should be practically impossible to find two different inputs that produce the same hash. (Theoretically collisions must exist — you’re mapping infinite inputs to a finite set of outputs — but a good algorithm makes finding them absurdly expensive.)
- Avalanche effect — change one bit of the input and roughly half the output bits flip. No gradual changes, no patterns. The output looks completely unrelated to the original.
SHA-256: the current workhorse
SHA-256 (Secure Hash Algorithm, 256-bit) is part of the SHA-2 family, designed by the NSA and published by NIST in 2001. It’s the algorithm you’ll encounter most often in modern security contexts: TLS certificates, blockchain (Bitcoin’s proof-of-work runs on SHA-256), package manager integrity checks, and digital signatures.
The “256” refers to the output size: 256 bits, rendered as 64 hexadecimal characters. That gives you 2²⁵⁶ possible outputs — a number so large that if every atom in the observable universe computed a hash every nanosecond, you’d still never enumerate them all. This enormous output space is what makes collision attacks computationally infeasible with current (and foreseeable) technology.
SHA-256 is slower than MD5 — roughly 20–30% slower for the same input, depending on the implementation. In practice, this difference is negligible for almost every use case. You’re hashing a file, not rendering a frame in a game engine. The security difference, on the other hand, is not negligible at all.
Let’s see it in action. The SHA-256 Generator lets you hash any text input and see the resulting 64-character digest instantly.
SHA-256
256-bit SHA-2 digest for modern integrity checks, file verification, and general-purpose hashing workflows.
Compare against a known digest
Generated digest
SHA-256 Hex lowercase
Generate a SHA-256 digest for text or files, then compare it against a known value in the same output format.
Try hashing “hello” and then “Hello” (capital H). Notice how the two outputs are completely different — not similar, not shifted by one character, but utterly unrecognisable from each other. That’s the avalanche effect in action. Now try hashing a longer string. The output is still exactly 64 characters. A single word and an entire paragraph produce digests of identical length.
That matters in practice because SHA-256 gives you a stable fingerprint you can compare later. If you hash a release file before upload and your colleague hashes the downloaded copy afterwards, matching digests tell you the file arrived intact. Non-matching digests tell you something changed, and you should assume the file is wrong until you know why.
MD5: fast, familiar, and broken
MD5 (Message-Digest Algorithm 5) was designed by Ronald Rivest in 1991 and was the dominant hash algorithm for over a decade. It produces a 128-bit hash (32 hex characters), and for a long time it was considered perfectly adequate for everything from password storage to file verification.
Then, in 2004, researchers demonstrated practical collision attacks against MD5. By 2008, a team used MD5 collisions to create a rogue CA certificate — a real-world attack that could have enabled mass surveillance. The algorithm’s collision resistance was fundamentally broken.
So is MD5 useless? Not entirely, but you need to be precise about what you’re using it for:
- For security purposes (passwords, digital signatures, certificates): MD5 is broken. Do not use it. Full stop.
- For non-adversarial integrity checks (did my file transfer get corrupted?): MD5 is still technically functional. A random bit flip during transfer won’t produce a collision. But even here, SHA-256 is a better choice because it costs you almost nothing extra and protects against deliberate tampering too.
- For hash tables and data structures: MD5 works fine. You’re not defending against attackers in a hash map.
My rule of thumb from my engineering days: if you’re reaching for MD5 and you have to think about whether it’s safe, just use SHA-256 instead. The performance difference is irrelevant for any realistic workload, and you’ll never have to justify the choice in a security review.
Try the MD5 Generator to compare. Hash the same inputs you used above and notice the shorter 32-character output.
MD5
128-bit legacy digest for compatibility checks, not for modern security-sensitive use.
Compare against a known digest
Generated digest
MD5 Hex lowercase
Generate an MD5 digest for text or files, then compare it against a known value in the same output format.
The outputs are shorter, and that’s not just cosmetic. The smaller output space (2¹²⁸ vs 2²⁵⁶) is part of what makes MD5 more vulnerable — fewer possible outputs means collisions are easier to find, especially with purpose-built hardware.
The practical takeaway is simple. If a modern tool gives you a choice, choose SHA-256 or stronger. If an older workflow still publishes only MD5, you can use it to catch accidental corruption, but you should not confuse “the MD5 matches” with “this is cryptographically trustworthy.”
Comparing algorithms side by side
If you want to hash the same input with multiple algorithms and compare the outputs, the Hash Generator supports several algorithms in one place.
Compare against a known digest
Generated digest
SHA-256 Hex lowercase
Generate a checksum or hash digest for text or files, then compare it against a known value in the same output format.
This is useful when you need to match a specific algorithm. Download pages, for instance, often provide checksums in SHA-256 or SHA-512. Package lock files might use SHA-512. Older systems might still reference MD5. Being able to generate the right algorithm’s digest and compare it character-by-character is the core of file integrity verification.
It is also a good sanity check when you are debugging tooling. If your CI pipeline says the artefact hash changed but nobody can explain why, hashing the exact payload locally with the same algorithm usually tells you whether you have a real content change, a line-ending change, or a packaging step doing something “helpful” behind your back.
How do you verify a checksum in practice?
This is the part people tend to skip because the release page already has a scary-looking wall of hex and everyone would like to get on with their day. Fair enough. But the actual process is simple:
- Download the file.
- Find the published checksum and note the algorithm used.
- Hash your local copy with the same algorithm.
- Compare the two digests exactly, character for character.
If they match, the file you downloaded is the file the publisher expected you to have. If they do not, stop there. Do not install it, do not assume the mismatch is “probably formatting”, and do not shrug because the file opened anyway.
On macOS or Linux, that often means using shasum -a 256 filename or sha256sum filename. On PowerShell, it is commonly Get-FileHash filename -Algorithm SHA256. The exact command matters less than the habit: always hash with the same algorithm the publisher used.
One more subtle point: a matching checksum proves equality, not legitimacy. If you downloaded a compromised file from a compromised page that also published the compromised checksum, those two values can still match perfectly. Checksums are excellent for integrity. Authenticity still depends on where you got the file, whether the site was trustworthy, and whether signatures or trusted package channels back it up.
Hashing vs encryption: the distinction that matters
This trips up a surprising number of developers, so let’s be explicit: hashing is not encryption.
Encryption is reversible. You encrypt data with a key, and someone with the correct key can decrypt it back to the original. The whole point is that the process is two-way.
Hashing is one-way. There is no key. There is no “un-hashing.” Once data is hashed, the original input is gone. The only thing you can do is hash a candidate input and check whether the output matches.
This is exactly why hashing is the right tool for password storage. When a user creates a password, you hash it and store the hash. When they log in, you hash whatever they typed and compare it to the stored hash. If the hashes match, the password was correct. At no point do you need to recover the original password — and crucially, if your database is breached, the attacker gets hashes, not passwords.
(Modern password hashing uses specialised algorithms like bcrypt, scrypt, or Argon2, which add deliberate slowness and salt to resist brute-force attacks. SHA-256 alone is too fast for password hashing — speed is a feature for file integrity but a liability for password security, because attackers can try billions of guesses per second.)
Practical uses: where you’ll encounter checksums
Here’s where hashing moves from theory to your daily workflow:
-
Verifying downloads. Software projects publish SHA-256 checksums alongside their releases. You download the file, hash it locally, and compare the two digests. If they match, the file wasn’t corrupted or tampered with in transit. This is standard practice for Linux ISOs, language runtimes, and database installers.
-
Package managers. npm, pip, cargo, and apt all use hash-based integrity checks. When you run
npm install, the lock file contains a hash for every package. If a package’s contents don’t match its recorded hash, the install fails. This protects against supply-chain attacks where a malicious actor replaces a legitimate package with a compromised version. -
Git. Every commit, tree, and blob in a Git repository is identified by a SHA-1 hash (Git is gradually migrating to SHA-256). When you run
git logand see those 40-character commit IDs, you’re looking at hashes of the commit content. This is how Git guarantees data integrity — if even one bit changes in a file’s history, every subsequent hash changes, and the discrepancy is immediately detectable. -
Digital signatures. When you sign a document or a software release, the signing process hashes the content first, then encrypts the hash with your private key. The recipient hashes the content independently and decrypts the signature with your public key. If the two hashes match, the content hasn’t been altered since you signed it.
-
Blockchain. Every block in a blockchain contains the hash of the previous block, forming a chain where altering any block invalidates every block after it. Bitcoin’s mining process is literally a competition to find an input that produces a SHA-256 hash below a target threshold.
Quick reference: SHA-256 vs MD5
| Property | SHA-256 | MD5 |
|---|---|---|
| Output size | 256 bits (64 hex chars) | 128 bits (32 hex chars) |
| Speed | Slightly slower | Slightly faster |
| Collision resistance | No known practical attacks | Broken since 2004 |
| Safe for security use | Yes | No |
| Safe for integrity checks | Yes | Acceptable (non-adversarial only) |
| Common uses | TLS, blockchain, package managers, digital signatures | Legacy systems, non-security checksums, hash tables |
Key takeaways
- A hash function is a one-way fingerprint. Same input always produces the same fixed-length output. You can’t reverse it.
- SHA-256 is the default choice for anything security-related. It’s well-tested, widely supported, and has no known practical attacks.
- MD5 is broken for security but still functional for non-adversarial integrity checks and data structures. When in doubt, use SHA-256.
- Hashing is not encryption. Encryption is two-way (with a key). Hashing is one-way (no key, no reversal).
- Checksums protect file integrity. When you download software and verify its hash, you’re confirming the file hasn’t been corrupted or tampered with.
- A checksum match proves sameness, not trustworthiness. It tells you your local file matches the published digest, not that the source itself was safe.
- Modern password storage uses specialised hashing (bcrypt, Argon2) — not raw SHA-256 or MD5 — because general-purpose speed is a vulnerability when defending against brute-force attacks.
The next time you see a 64-character hex string on a download page, you’ll know exactly what it is: a mathematical fingerprint, designed to be unique, irreversible, and your first line of defence against file tampering. Go ahead and verify it — the generators above make it trivially easy.
This guide covers the concepts and common algorithms at a practical level. For production security decisions — password hashing strategies, certificate pinning, or cryptographic protocol design — always consult your organisation’s security team or a qualified cryptography specialist.
Calculators used in this article
Technology / Security / Hash & Checksum Tools
SHA-256 Generator
Generate SHA-256 digests for text or files, compare known values, and switch between hex and Base64 output for modern verification workflows.
Technology / Security / Hash & Checksum Tools
MD5 Generator
Generate MD5 digests for text or files, compare known values, and switch between hex and Base64 output with clear legacy guidance.
Technology / Security / Hash & Checksum Tools
Hash Generator
Generate MD5, SHA-1, SHA-256, SHA-384, and SHA-512 hashes for text or files, compare known digests, and switch between hex and Base64 output.