How Password Hashing Actually Works: Salts, Bcrypt, and Rainbow Tables
The Fundamental Mistake: Storing Passwords You Can Read Back
In 2012, LinkedIn lost 6.5 million password hashes. By 2016, researchers discovered the actual breach was 117 million accounts. The attackers cracked a significant chunk of those hashes within days — not because they broke SHA-1, but because LinkedIn skipped a single step: salting. That omission, combined with a weak hashing algorithm, turned a bad breach into a catastrophic one.
This is the story of what password storage should look like, why plaintext is obviously disqualifying, why simple hashing isn't enough, and what bcrypt (and its successors) actually do under the hood.
Hashing Is Not Encryption — This Distinction Matters
Developers new to security sometimes conflate hashing and encryption. They're different operations with different purposes, and mixing them up leads to real vulnerabilities.
Encryption is reversible. You encrypt data with a key, and anyone with that key can decrypt it back to plaintext. AES-256 is symmetric encryption — same key encrypts and decrypts. RSA is asymmetric. Either way, the original data is recoverable. This is useful for things like credit card numbers you need to charge again, or files you need to retrieve. It is catastrophically wrong for passwords, because if your key leaks, every password in your database is immediately readable.
Hashing is a one-way function. You feed in arbitrary data, you get back a fixed-length digest. There's no "unhash" operation. SHA-256 always produces 64 hex characters regardless of whether your input is "a" or the complete works of Shakespeare. The math behind this involves modular arithmetic and bitwise operations designed to be computationally cheap to compute forward and practically impossible to reverse.
When a user logs in, you hash what they typed and compare it against the stored hash. If they match, the password is correct. You never need the original — so you should never store it.
Why Plain Hashing Fails: The Rainbow Table Problem
Here's where many developers stop thinking and ship something broken: they hash passwords with MD5 or SHA-256 and call it a day. That's not enough.
A rainbow table is a precomputed lookup table mapping hashes back to their source strings. Generating one takes enormous time and storage upfront, but once built, looking up a hash is nearly instantaneous. Attackers don't "crack" your hash in the traditional sense — they just look it up.
Consider how this works in practice. The MD5 hash of "password123" is always 482c811da5d5b4bc6d497ffa98491e38. If an attacker gets your database and sees that hash, they don't need to reverse any math. They query their rainbow table, and "password123" comes back in milliseconds. The hash that was supposed to protect the password becomes essentially meaningless.
Rainbow tables exist for MD5, SHA-1, SHA-256, and other fast hashes. They cover huge swaths of commonly-used passwords, dictionary words, and their common variations. Using a fast general-purpose hash function for password storage is not a minor oversight — it's treating your users' credentials as if they were unprotected.
Salting: The Fix That Makes Rainbow Tables Useless
A salt is a randomly generated string, unique per user, that you concatenate with the password before hashing. Instead of storing hash("password123"), you store hash("password123" + "xK9$mQ2p") alongside the salt itself.
The salt doesn't need to be secret — it's typically stored in plaintext next to the hash in your database. Its purpose isn't confidentiality; it's uniqueness. Here's why this defeats rainbow tables:
- Rainbow tables are built for known inputs. When you add a random salt, the attacker would need to rebuild their entire table for every possible salt — which makes precomputation economically infeasible.
- Two users with the same password produce different hashes if their salts differ. So even if an attacker identifies one cracked hash, they can't use that to instantly expose everyone else who used the same password.
- The attacker is forced back to brute force: for each account, they have to try passwords one by one, hashing each attempt with that account's specific salt. This is orders of magnitude slower.
Salts should be generated with a cryptographically secure random number generator and be long enough to be effectively unique — 16 bytes minimum. Don't use usernames or email addresses as salts. They're not random, and they're reused across systems.
Key Stretching: Making Brute Force Hurt
Even with salting, fast hash functions are a problem. SHA-256 was designed to be fast — modern hardware can compute billions of SHA-256 hashes per second. A GPU cluster can burn through enormous password spaces quickly, even without rainbow tables.
Key stretching is the solution. The idea is deliberately making the hash computation slow — slow enough that verifying a legitimate login takes a few hundred milliseconds (imperceptible to users), but slow enough that brute-forcing millions of candidates takes years of compute time instead of days.
The classic technique is iteration: hash the password, then hash the hash, then hash that, thousands of times. PBKDF2 does exactly this. You configure a number of iterations (the "work factor") and it applies the underlying hash function that many times. Doubling the iterations doubles the attacker's cost. As hardware gets faster, you bump the iteration count.
Bcrypt: The Algorithm That Stays Relevant
Bcrypt was designed in 1999 specifically for password hashing, and it remains one of the most widely recommended choices today. It incorporates both salting and key stretching natively, and its design has a property that PBKDF2 lacks: it's memory-hard.
Here's what bcrypt actually does:
- It generates a random 128-bit salt automatically.
- It uses the Blowfish cipher's expensive key setup phase (called EksBlowfish) as the core of its computation. This setup involves iterating the key schedule 2^(work_factor) times — the work factor typically ranges from 10 to 14 in production systems.
- It runs a fixed string ("OrpheanBeholderScryDoubt") through 64 rounds of Blowfish using the derived key.
- It returns a single string containing the algorithm identifier, work factor, salt, and hash — all self-contained.
The output looks like this: $2b$12$KIcFYeGv1jMY9nBz/WI.MeAqMhK2iXsEp6nQ1jM9RmJxF8wVhYeLe. The 2b is the bcrypt variant, 12 is the work factor (meaning 2^12 = 4096 iterations), and the rest encodes the salt and hash together.
The memory-hard aspect matters because it limits GPU and ASIC acceleration. GPUs are extraordinarily fast at simple parallel operations but struggle with algorithms that require significant memory access in complex patterns. Bcrypt's key setup is one such algorithm — it doesn't parallelize nearly as efficiently as SHA-256, which means an attacker's GPU advantage is dramatically reduced.
What About Argon2, Scrypt, and the Modern Alternatives?
Bcrypt has limitations. Its password input is truncated at 72 bytes, which is a real constraint for long passphrases. It predates GPUs being the dominant attack vector, so while it resists them reasonably well, newer algorithms do better.
Scrypt adds a configurable memory requirement on top of iteration count, making it even more resistant to GPU and ASIC attacks. It exposes three parameters: CPU cost, memory cost, and parallelization factor. Tuning these correctly requires care, but a well-configured scrypt is harder to attack than bcrypt.
Argon2 won the Password Hashing Competition in 2015 and is the current state-of-the-art recommendation from most cryptographers. It comes in three variants: Argon2d (fastest, resistant to GPU attacks), Argon2i (resistant to side-channel attacks), and Argon2id (hybrid, recommended for general password hashing). It has no 72-byte truncation issue, handles parallelism explicitly, and its parameters are easier to reason about than scrypt's. If you're starting a new project today, Argon2id is what you should reach for.
What Developers Must Actually Do
Theory is useful but insufficient. Here's the practical implementation checklist:
- Never store plaintext passwords. This should be self-evident, but data breaches keep proving it isn't. There is no legitimate reason your application ever needs to retrieve a user's original password.
- Never build your own crypto. Use your language's established library:
bcryptin Node.js,passlibin Python,password_hash()with PASSWORD_BCRYPT or PASSWORD_ARGON2ID in PHP, Spring Security'sBCryptPasswordEncoderin Java. These handle salt generation and work factor for you. - Set a work factor that causes ~200-300ms delay on your server hardware. Benchmark on your actual production hardware and adjust. A work factor that takes 300ms today might need bumping in three years.
- Build in work factor migration. When a user logs in successfully, check if their stored hash uses an outdated work factor. If so, re-hash their password with the new parameters and update the database. You can't rehash without the plaintext, so this login-time migration is the only opportunity.
- Hash before transport if possible, or at minimum enforce HTTPS. A password hashed client-side before transmission protects against network interception, though server-side hashing must still occur — client-side hash becomes the "password" the server sees.
- Enforce minimum password length, not arbitrary complexity rules. Long passwords are more entropy. An 18-character passphrase with only lowercase letters is vastly stronger than an 8-character password with symbols. Pepper (a server-side secret appended before hashing) adds an additional layer — if the database leaks but the application server isn't compromised, the hashes are useless without the pepper.
The Attacker's Remaining Options
Implementing salted bcrypt or Argon2id correctly doesn't make passwords unassailable. It makes offline cracking computationally expensive. Attackers adapt: they focus on commonly-used passwords first (credential stuffing lists, top-10-million password wordlists), they target users with weak passwords, and they look for other attack surfaces entirely — phishing, keyloggers, OAuth token theft.
Proper password hashing is one layer. Combine it with rate limiting and lockout on authentication endpoints, breach detection (checking submitted passwords against known-breached lists via k-anonymity APIs like Have I Been Pwned's), and multi-factor authentication. Password hashing protects your users when you fail — when your database gets stolen. The goal is making that failure cost the attacker enough time and compute that most passwords remain safe long enough for users to be notified and change them.
The LinkedIn breach is a case study in what happens when you skip these steps. Salting and a proper work factor would have bought months of cracking time instead of days. For 117 million users, that's the difference between a manageable incident and a lasting catastrophe.