Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#21022 Switch to using bcrypt for hashing passwords and security keys #7333

Open
wants to merge 51 commits into
base: trunk
Choose a base branch
from

Conversation

johnbillion
Copy link
Member

@johnbillion johnbillion commented Sep 11, 2024

Latest approach to switching from phpass to bcrypt via the native PHP password hashing functions.

This covers:

  • User passwords
  • Application passwords
  • Post passwords
  • User password reset keys
  • Personal data request keys
  • The recovery mode key

Lots more info about the change can be found in the draft for the make/core announcement post.

Tickets

Notes

  • wp_check_password() remains compatible with phpass hashes, so password checks will continue to succeed when checking a password against its existing hash. There is no need for users to change or resave their password.
  • User passwords are rehashed using bcrypt after successful authentication in wp_authenticate_username_password() or wp_authenticate_email_password().
  • Application passwords are rehashed using bcrypt after successful authentication in wp_authenticate_application_password().
  • wp_check_password() and the rehashing that occurs in the above functions are forward-compatible with future changes to the default bcrypt cost (the default bcrypt cost was increased in PHP 8.4), so password checks will continue to succeed when checking a password against its existing hash.
  • wp_hash_password() and wp_check_password() retain support for a global $wp_hasher object which can be used to override the password hashing and checking mechanisms.

Elsewhere

FAQs

What about salting?

Salting a bcrypt hash is handled by password_hash() and password_verify(). There is no need to implement salting in userland code.

What about peppering?

Peppering effectively eliminates the ability to crack a password given only its hash by introducing a secret which needs to be stored outside of the database and is used as part of the value that's hashed, as long as the pepper remains secret. This is compelling, however the portability of a password hash is also eliminated and if the secret pepper is lost, changed, or differs between environments, then all password hashes are invalidated. All users will need to reset their passwords in order to log in, and all application passwords and post passwords will need to be changed.

While a secret pepper does prevent an attacker from being able to crack a password hash if they gain access only to the database, the potential usability tradeoff is high. In addition, the intention of switching to bcrypt is to switch to a password hashing algorithm that is highly resistant to cracking in the first place, thus reducing the benefit gained from peppering.

What about layering hashes to immediately protect legacy hashes?

Hash layering is the process of taking an existing password hash (for example one hashed with phpass) and applying the new hashing algorithm on top of it (bcrypt in this case). The intention is to immediately protect stored passwords with the new hashing algorithm instead of waiting for users to log in or change their passwords in order to rehash the password.

A concern is the length of time it could take to rehash all passwords in the database during the upgrade routine. Handling would need to be implemented to cover usage of passwords (for example logging in or changing passwords) while the password upgrade routine runs.

Additional risks include null byte characters present in the raw output of other hashing algorithms, and password shucking. OWASP warns that layering hashes can make the password easier to crack.

None of this is insurmountable, but it adds complexity for what is in most cases a short term benefit. For this reason, hash layering has not been implemented.

What about the 72 byte limit of bcrypt?

This is an ongoing consideration. There is discussion about this limit in the comments on #21022 and on #50027, and a separate PR at johnbillion#5 implements a workaround. Read on for all the details.

Out of the platforms listed above, only Symfony provides any specific handling for passwords greater than 72 bytes in length (introduced in Symfony 5.3 in 2021). Passwords longer than 72 bytes are hashed with sha512 and then base64 encoded prior to being hashed with bcrypt.

The sha512 hashing is used to retain the full entropy of the original password, regardless of its length. The base64 encoding is to protect against null byte characters. Passwords shorter than 72 bytes do not receive hash layering treatment.

sha512 hashing is also used by Dropbox prior to hashing with bcrypt.

This PR in its current form does not include any specific handling for the 72 byte limit, but there are options:

  1. Ignore the 72 byte limit like many platforms do.
    • A maximum of 72 bytes can contribute to the password entropy.
    • Any subsequent bytes are discarded both during hashing and during checking.
    • Relatively risk free as we're not layering hashing.
  2. Implement sha384 or sha512 pre-hashing to preserve the entropy of passwords of any length.
  3. Implement a 72 byte password length limit. This could end up being very complex to implement:
    • Does the user need to be informed when they attempt to use a password longer than 72 bytes?
    • A 72 character limit on password input fields would need active warnings rather than just maxlength attributes because the input value is obscured when typing/pasting/autofilling.
    • 72 bytes does not equal 72 characters and vice versa due to multibyte characters.

What about DoS attacks from long passwords?

The bcrypt implementation in PHP is not susceptible to a DoS attack from long passwords because only the first 72 bytes of the password value are read, therefore there is no need to guard against a long password value prior to it being hashed or checked.

While phpass can be susceptible to a DoS attack via a very long password value, the phpass implementation in WordPress protects against this via a 4096 byte limit when hashing a password and when checking a password. This is unrelated to the password length limit discussed above.

What about the cost factor?

The default cost factor will be used. This is 10 in PHP up to 8.3 and has been increased to 12 in PHP 8.4. Hashes remain portable between installations of PHP that use different cost factors because the cost factor is encoded in the hash output.

It's beyond the scope of WordPress to make adjustments to the cost factor used by bcrypt. If you are planning on updating to PHP 8.4 then you should consider whether the default cost is appropriate for the resources available on your server. The wp_hash_password_options filter is available to change the cost factor should it be needed.

What about using PASSWORD_DEFAULT instead of PASSWORD_BCRYPT?

The intention of the PASSWORD_DEFAULT constant in PHP is to take advantage of future changes to the default algorithm that's used to hash passwords. There are currently no public plans to change this algorithm (at least, I haven't found any), but for safety it makes sense to be explicit about the use of bcrypt in WordPress. This can easily be changed in the future.

What about Argon2?

Unfortunately it's not possible to rely on argon2 being available because it requires both libargon2 to be available on the server and for PHP to be built with argon2 support enabled. Using argon2 via sodium_compat still requires the optional libsodium extension to be installed. Conditionally using argon2i, argon2id, or bcrypt depending on what is available on the server would increase complexity and limit the portability of hashes.

What about scrypt?

There is no native support for scrypt in PHP, and using scrypt via sodium_compat still requires the optional libsodium extension to be installed

Is this change compatible with existing plugins that implement bcrypt hashing?

It should be, yes. If you've used such a plugin to hash passwords with bcrypt then those hashes should be compatible with this bcrypt implementation and you should be able to remove the plugin.

What effect will this have on my database when a user logs in?

The first time that each user subsequently logs in after this change is deployed to your site, their password will be rehashed using bcrypt and the value stored in the database. This will result in an additional UPDATE query to update their user_pass field in the users table.

The query will look something like this:

UPDATE wp_users
SET user_pass = '<hash>', user_activation_key = ''
WHERE ID = <id>

This query performs an UPDATE but makes use of the primary key to target the row that needs updating, therefore it should remain very performant. The query only runs once for each user. When they subsequently log in again at a later date their password will not need to be rehashed again.

Note that when a user logs in, WordPress already writes to the database via an UPDATE query on the usermeta table to store their updated user session information in the session_tokens meta field. This happens every time a user logs in.

How do I use an algorithm other than bcrypt on my website?

The wp_hash_password(), wp_check_password(), and wp_password_needs_rehash() functions are all pluggable. See wp-password-bcrypt as an example of overwriting them in a plugin.

Alternatively, if you need to temporarily or permanently stick with phpass you can instantiate the $wp_hasher global and this will be used instead:

add_action(
  'plugins_loaded',
  function(): void {
    global $wp_hasher;
    require_once ABSPATH . WPINC . '/class-phpass.php';
    $wp_hasher = new PasswordHash( 8, true );
  }
);

Todo

  • Make a decision on the 72 byte limit imposed by bcrypt
  • Decide whether the options array for password_hash() should be filterable.
  • Decide whether wp_hash_password() and wp_check_password() need to retain back-compat support for the global $wp_hasher
  • Fully support changes to the default bcrypt cost
  • Tests for the hash upgrade routine for a user password
  • Tests for the hash upgrade routine for an application password
  • Tests for handling post password hashes
  • Address md5 hashing of passwords in repair.php

Testing

There's good test coverage for this change. Here are some manual testing steps that can be taken.

Remaining logged in after the update

  • Start on the trunk branch
  • Log in to WordPress
  • Switch to this branch
  • Ensure you have remained logged in
  • Log out and back in again
  • Confirm it works as expected
  • Confirm that the user_pass field for your user account in the wp_users table in the database has been updated -- it should be prefixed with $2y$ instead of $P$

Post passwords

  • Start on the trunk branch
  • Publish a post and set a password for it
  • View the post and enter the password to view it
  • Switch to this branch
  • Confirm you can still see the post without having to re-enter its password
  • Edit the post and change its password
  • View the post and confirm that you can re-enter the new password and view it

Password resets

  • Start with the "Lost your password?" link on the login screen
  • Click the confirmation link sent to your email
  • Follow the process of resetting your user password
  • Confirm you can log in with your new password

Personal data requests

  • Start on the trunk branch
  • Log in to WordPress as an Administrator
  • Initiate a data export from Tools -> Export Personal Data
  • Switch to this branch
  • Click the confirmation link sent to the email address and confirm that it still works
  • Initiate another data export from Tools -> Export Personal Data
  • Click the confirmation link sent to the email address and confirm that it works

@johnbillion johnbillion marked this pull request as draft September 11, 2024 18:38

This comment was marked as outdated.

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@johnbillion johnbillion changed the title #21022 Switch to using bcrypt for password and security key hashing #21022 Switch to using bcrypt for hashing passwords and security keys Sep 11, 2024
@soatok
Copy link

soatok commented Nov 21, 2024

Using SHA for pre-hashing will cause more trouble in future upgrades (assuming we need to upgrade to argon2id one day) and will not benefit the vast majority of users and will reduce performance.

Can you be specific? What "more trouble" are you envisioning?

If we upgrade to Argon2 one day, all pre-hashing passwords will need to be re-hashed (Argon2 does not have the 72-byte limit), or we can continue to keep base64+sha384, but this is not necessary in argon2.

If we use PHP's default implementation directly, we won't have these troubles (PHP will handle it automatically in future version).

Consider:

if (hash_equals('$2', substr($pwhash, 0, 2)) {
    $prehash = base64_encode( hash_hmac( /* ... */ ) );
    if (password_verify($prehash, $pwhash)) {
        $newHash = password_hash($password, /* ... */).
    }
}

@devhaozi
Copy link

devhaozi commented Nov 21, 2024

Another example for reference is laravel/framework#49752 , where they seem to have decided to do nothing.

@johnbillion
Copy link
Member Author

johnbillion commented Nov 21, 2024

That's correct. Out of the PHP projects that I listed in the PR, only symfony/password-hasher includes any handling for passwords greater than 72 bytes, and even then it still truncates the resulting hash due to the use of sha512+base64 which outputs 88 bytes.

The discussion around hash layering feels dangerously close to rolling your own crypto in order to provide a small amount of additional benefit to an exceptionally small proportion of users.

@soatok
Copy link

soatok commented Nov 21, 2024

Thankfully, this isn't anywhere near "rolling our own crypto". The SHA2+base64 paradigm has been implemented in several proprietary systems I've worked on over the past decade.

I also cited password_lock above as an open source implementation that does a morally equivalent job. If you need a cryptography expert to sign off on this design, ask the developer of that library to review whatever we propose.

and even then it still truncates the resulting hash due to the use of sha512+base64 which outputs 88 bytes.

SHA-384 and SHA-512 are the exact same algorithm, except SHA-384 has a different initialization vector and is truncated to 384 bits. I don't think it makes any sense to obsess over the truncation here.

If you base64-encode a raw SHA-512 hash to get 88 bytes, and then bcrypt truncates it to a "mere" 72, you still have an input keyspace of 6^72 or 486 bits of entropy. This is still in far excess of the 128 bits of entropy necessary to be considered secure.

The SHA-2 family of hash functions is secure in most usages (exception: if you're building a MAC by truncating a key and the message). Using HMAC eliminates length-extension, and you can treat HMAC with a secret key as a random oracle.

If you want to be extra paranoid, instead of base64_encode, prefer sodium_bin2base64 if it's available.


At the risk of being mildly self-promotional, I'd like to refer any curious parties to some of my previous writing on this subject if anything I've said here (or are likely to say in subsequent comments) is a little over your head:

These algorithms, and this specific way of combining these algorithms, is well-understood.

@devhaozi
Copy link

Thankfully, this isn't anywhere near "rolling our own crypto". The SHA2+base64 paradigm has been implemented in several proprietary systems I've worked on over the past decade.

I also cited password_lock above as an open source implementation that does a morally equivalent job. If you need a cryptography expert to sign off on this design, ask the developer of that library to review whatever we propose.

and even then it still truncates the resulting hash due to the use of sha512+base64 which outputs 88 bytes.

SHA-384 and SHA-512 are the exact same algorithm, except SHA-384 has a different initialization vector and is truncated to 384 bits. I don't think it makes any sense to obsess over the truncation here.

If you base64-encode a raw SHA-512 hash to get 88 bytes, and then bcrypt truncates it to a "mere" 72, you still have an input keyspace of 6^72 or 486 bits of entropy. This is still in far excess of the 128 bits of entropy necessary to be considered secure.

The SHA-2 family of hash functions is secure in most usages (exception: if you're building a MAC by truncating a key and the message). Using HMAC eliminates length-extension, and you can treat HMAC with a secret key as a random oracle.

If you want to be extra paranoid, instead of base64_encode, prefer sodium_bin2base64 if it's available.

As mentioned before, passwords longer than 72 bytes are rarely used, so it is worth considering whether it is worth doing this.

For sodium_bin2base64, it is impossible, WP usually only accepts PHP's own implementation (which is why Argon2 is not used, Argon2 requires PHP to use additional compilation parameters).

There is still a long time before 6.8, let's see what other developers think.

@soatok
Copy link

soatok commented Nov 21, 2024

For sodium_bin2base64, it is impossible,

if (!is_callable('sodium_bin2base64')) {
/**
* @see ParagonIE_Sodium_Compat::bin2base64()
* @param string $string
* @param int $variant
* @return string
* @throws SodiumException
* @throws TypeError
*/
function sodium_bin2base64(
#[\SensitiveParameter]
$string,
$variant
) {
return ParagonIE_Sodium_Compat::bin2base64($string, $variant);
}
}

:)

@devhaozi
Copy link

For sodium_bin2base64, it is impossible,

if (!is_callable('sodium_bin2base64')) {
/**
* @see ParagonIE_Sodium_Compat::bin2base64()
* @param string $string
* @param int $variant
* @return string
* @throws SodiumException
* @throws TypeError
*/
function sodium_bin2base64(
#[\SensitiveParameter]
$string,
$variant
) {
return ParagonIE_Sodium_Compat::bin2base64($string, $variant);
}
}

:)

If so, is there a pure PHP implementation of Argon2? Using Argon2 directly would not have these annoying problems.

@johnbillion
Copy link
Member Author

I've covered argon2 and scrypt in the PR description. Unfortunately there is not.

@soatok
Copy link

soatok commented Nov 21, 2024

As far as I'm aware, sodium_compat does not include scrypt or argon2.

@johnbillion
Copy link
Member Author

Again, I've covered this already in the PR description. sodium_compat still requires the optional libsodium extension in order to support argon2 or scrypt. There is no polyfill for either in sodium_compat.

I think we should put a pin in this conversation about the 72 byte limit of bcrypt. It's an interesting computer science problem but it's so far removed from any real life problem that it distracts from assessing the underlying switch to bcrypt.

A 72 byte password has 576 bits of entropy. As soon as a password is over 16 bytes in length, you're beyond the widely recommended 128 bits of entropy to consider a password secure. There is no practical need to retain a higher entropy for passwords greater than 72 bytes.

@soatok
Copy link

soatok commented Nov 21, 2024

it's so far removed from any real life problem that it distracts from assessing the underlying switch to bcrypt.

I respectfully disagree, for two reasons:

  1. WordPress has historically bent over backwards to accommodate a small minority of users. This is evident in the minimum PHP bump from 5.2.4 to 5.6 taking so many years after the PHP team stopped supporting it. That only a few users will see a tangible difference has, historically, not been a valid reason to make a security change.
  2. Bcrypt + SHA2 + Base64, as discussed above, materially improves the security and raises the bar for the PHP ecosystem, making WordPress a leader rather than a follower.

The 72-byte limit isn't the only footgun of vanilla bcrypt. Truncating on NUL characters is the other (which is why eschewing the base64 encoding is a bad move).

That being said, consider users with long passphrases. but where each word in the passphrase is low-entropy (e.g. diceware). They're well-served by this change.

That being said:

I think we should put a pin in this conversation about the 72 byte limit of bcrypt.

That's fine. This is my final word on this topic, unless explicitly queried by another user.

@devhaozi
Copy link

devhaozi commented Nov 21, 2024

it's so far removed from any real life problem that it distracts from assessing the underlying switch to bcrypt.

I respectfully disagree, for two reasons:

  1. WordPress has historically bent over backwards to accommodate a small minority of users. This is evident in the minimum PHP bump from 5.2.4 to 5.6 taking so many years after the PHP team stopped supporting it. That only a few users will see a tangible difference has, historically, not been a valid reason to make a security change.
  2. Bcrypt + SHA2 + Base64, as discussed above, materially improves the security and raises the bar for the PHP ecosystem, making WordPress a leader rather than a follower.

The 72-byte limit isn't the only footgun of vanilla bcrypt. Truncating on NUL characters is the other (which is why eschewing the base64 encoding is a bad move).

That being said, consider users with long passphrases. but where each word in the passphrase is low-entropy (e.g. diceware). They're well-served by this change.

That being said:

I think we should put a pin in this conversation about the 72 byte limit of bcrypt.

That's fine. This is my final word on this topic, unless explicitly queried by another user.

The criterion for WordPress to stop supporting PHP is the usage rate (as far as I know, this value is less than 5%).
I think the percentage of users using passwords longer than 72 bytes is much less than 5%.

@mbijon
Copy link

mbijon commented Nov 25, 2024

I'm with @johnbillion on straight password->bcrypt.

It's counterintuitive, but it will be more secure than password->sha512->bcrypt. Plus, why are we arguing about 470 vs 500+ bits of entropy when ANY form of bcrypt is >30,000,000 more time-consuming to brute force than md5.


The cryptographers who designed bcrypt were well aware it limited possible entropy to ~470 bits. They compromised there on purpose. At modern cost factors it's 1,000's of years of complexity to brute force a 72 character password. But that limitation prevents resource exhaustion DoS attacks on lower-end servers.

Hashing password->base64(sha512)->bcrypt is less secure in several ways. It enables breach correlation / shucking more easily than raw bcrypt. Plus because of the base64 encoding it LOSES entropy from ALL input passwords. This gets dangerous for the large % of users who have 10 char passwords. They start with a max of ~65 bits of entropy, then clipping from 88 -> 72 characters loses some of that already limited entropy. Progressively more pre-bcrypt steps to deal with this clipping just risks opening DoS vectors.

...meanwhile most users (70-80% per HaveIBeenPwned) are using common, short, and/or recycled passwords. If we're really worried about password security we should raise WP's minimum password length, not bikeshed something cryptographers already considered when designing bcrypt.

@soatok
Copy link

soatok commented Nov 25, 2024

@mbijon Your comment is wrong on several levels, including one I had been ignoring in the previous discussion because I didn't want to derail it with pedantry. Unfortunately, this pedantry is now necessary.

It enables breach correlation / shucking more easily than raw bcrypt. Plus because of the base64 encoding it LOSES entropy from ALL input passwords. This gets dangerous for the large % of users who have 10 char passwords. They start with a max of ~65 bits of entropy, then clipping from 88 -> 72 characters loses some of that already limited entropy. Progressively more pre-bcrypt steps to deal with this clipping just risks opening DoS vectors.

The most obvious evidence against this statement is the real-world security impact of bcrypt's truncation.

It enables breach correlation / shucking more easily than raw bcrypt. Plus because of the base64 encoding it LOSES entropy from ALL input passwords. This gets dangerous for the large % of users who have 10 char passwords.

This is nonsense, which I will explain below.

They start with a max of ~65 bits of entropy, then clipping from 88 -> 72 characters loses some of that already limited entropy.

This is also nonsense, which I will explain below.

Progressively more pre-bcrypt steps to deal with this clipping just risks opening DoS vectors.

The performance impact of a SHA512 hash compared to a password hashing algorithm is negligible. You can benchmark it yourself.

<?php
// Initialize up front
$start = $stop = microtime(true);

// reasonably long example "password"
$x = str_repeat('A', 128);
$y = hash('sha512', $x, true); 
$i = 0;


$start = microtime(true);
for ($i = 0; $i < 100_000; ++$i) {
    $y = hash('sha512', $y, true); 
}
$stop = microtime(true);

echo '100,000 iterations of SHA-512 took ', number_format($stop - $start, 3), ' seconds.', PHP_EOL;

$start = microtime(true);
$y = password_hash($x, PASSWORD_BCRYPT);
$stop = microtime(true);

echo '1 iteration of bcrypt took ', number_format($stop - $start, 3), ' seconds.', PHP_EOL;

Here's the output on my machine:

100,000 iterations of SHA-512 took 0.039 seconds.
1 iteration of bcrypt took 0.045 seconds.

We aren't doing 100,000 iterations of SHA-512 here, only 1 or 2 (in the case of HMAC).

The DoS risk for this pre-bcrypt hashing isn't real and can't hurt you.

...meanwhile most users (70-80% per HaveIBeenPwned) are using common, short, and/or recycled passwords. If we're really worried about password security we should raise WP's minimum password length, not bikeshed something cryptographers already considered when designing bcrypt.

You... do realize that a lot of cryptographers don't like bcrypt, right? That's why the whole Password Hashing Competition existed. We settled on Argon2id and yescrypt. The whole cryptography community was there.

EDIT - On My Remark About Cryptographers' Dislike of Bcrypt

I've received feedback that the previous statement wasn't precise (which is fair; I was aiming for clarity, not precision). So here's a bit of a pedantic explanation.

When comparing Bcrypt and Argon2:

  1. Bcrypt is cache-hard, while Argon2 is only cache-hard if you use the non-standard ds variant.
  2. Bcrypt is not suitable as a KDF (unless you do something very nonstandard).

Some of the Password Hashing Competition judges have lamented that they didn't prioritize cache-hardness over memory-hardness, and they recommend bcrypt instead of Argon2 because Argon2 is better as a KDF than a password hash for low latency settings.

I wrote about a lot of these nuances in my blog post, which was linked above.

One of the PHC judges went on to design an algorithm called bscrypt to provide stronger cache-hardness guarantees. But then he found a vulnerability in his own code, and he urged people not to use it until it's fixed. Perhaps in a year or two we can have a successor to the PHC that focuses on password hashing, not password KDFs.

When I said "cryptographers don't like bcrypt", I mean something very specific:

If you walk up to a cryptographer and say, "I'm designing an authentication system. It uses bcrypt for password hashing." you will get one of two responses:

  1. "Is there anything preventing you from using Argon2 or scrypt?"
  2. "Okay. I don't care which you use, as long as it's a password hashing function and not, like, MD5."

Very rarely will you hear, "Okay, yes, thank you for using bcrypt and not scrypt or Argon2."

Separately, OWASP (which is not famously a home for cryptographers but is highly regarded in appsec circles) in particular wants bcrypt to die in a fire precisely because of the 72-char and null-truncation vulnerabilities.

I'm personally in the "as long as it's not, like, SHA256(password), and you're using an actual password hash function" camp, I just wanted to suggest removing the bcrypt footguns.

On Password Shucking

This isn't a real concern for bcrypt(SHA512()) because the previous password hash was phpass, which is based on MD5.

If we were storing the SHA512 hash of the password somewhere, a) that's stupid and b) we should domain-separate our use of SHA512. But I don't think WordPress or any plugins are doing that particular snafu, so it's unlikely to matter.

Revisiting the Misleading Claims About Entropy

The competing proposals looks like this (vaguely):

  1. Hash (or HMAC) the password with SHA-512.
    • SHA-384 is morally congruent to SHA-512, since it's the same exact algorithm but with truncation built-in, and a distinct IV for domain separation. The round functions, word size, and operations are preserved between the two. You don't need to care about this distinction.
  2. Base64-encode the hash output from step 1.
  3. Use the output of step 2 with password_hash or password_verify.

SHA-512 is a hash function. Aside from length extension (which isn't relevant to our threat model), you can model any of the SHA-2 family hash functions as a Random Oracle.

SHA-512 maps an input domain of 256^(2^(63)-1) possible inputs to a mere 512 bits. The input isn't guaranteed to be uniformly random (as they are passwords), but the output is.

Base64-encoding the output of a hash function does not "[enable] breach correlation / shucking more easily than raw bcrypt". At the layer of base64, we have a uniformly random distribution of 512 bits, which gets encoded as slightly more characters from the base64 alphabet. There is no entropy loss at the base64 layer.

The base64 encoding does not, at all, "[lose] entropy from ALL input passwords."

The "entropy" of the password chosen is preserved as long as SHA-512 collisions are computationally infeasible.

One objection to my previous statement might be, "But what about the 88 -> 72 truncation?"

This is still secure. 72 characters from the base64 alphabet is still a keyspace representing about 432 bits (or 54 bytes without base64 encoding). This is only 10 bytes short of an untruncated SHA-512, and 6 bytes longer than SHA-384.

The best attack against 432 bits of uncertainty is a birthday collision attack. You'd need about 2^216 samples for a 50% chance of a collision (or 2^144 samples for a 1/2^144 chance of a collision).

2^-100 isn't going to happen, stop trying to make 2^-100 happen.

Any concerns about the reduction of entropy from the strategies we've been discussing are either rooted in a poor understanding of password-based cryptography, or are bikeshedding over differences that might as well be zero to any WordPress user.

Meanwhile, that Okta breach is a real thing that happened because of bcrypt's truncation. Maybe we should learn from real world incidents?

(EDIT: Updated script, increased iteration count to be closer to bcrypt.)

@soatok

This comment was marked as off-topic.

@mbijon
Copy link

mbijon commented Nov 26, 2024

Stop it @soatok. You're undermining a clear improvement on md5 with incorrect information:

Okta's breach was caused by using bcrypt outside it's design params as a cache key generator. They prefixed heavily in a way that consumed most of the 72 bytes in bcrypt's input. This PR does the right thing and uses password-only as input and without hashing, as recommended by both OWASP and NIST.

The hobbyist community's concern over collisions between two 73-char passwords passed directly to bcrypt is wrong in two ways:

  1. Most of the examples will have an example where bcrypting 73 chars like aaa...aaa1 and aaa...aaa2 produce the same output. That's some userland mess that doesn't happen as long as you don't pass a salt to PASSWORD_BCRYPT. Because bcrypt internally salts with Blowfish randoms, two users can use the SAME password of ANY length and get different bcrypt outputs.
  2. Anyone using 73+ char passwords is using a generator with some level of randomness and not aaa..., 123...123... or anything remotely simple.

SHA hashing a password does not increase entropy beyond the original password's. SHA may randomize the characters but there's a symbolic (1:1) correlation between those characters and a low-entropy password. Clipping the length of a base64'd string does matter. It reduces the space to store the shortened SHAs of all 10-char passwords.

Regarding cryptographers, NIST does approve of bcrypt and will for several more years. (...it's looking to be past 2029 before FIPS 140-3 validates any modules that might not include bcrypt. More like 2032-2035 if this DOGE department cuts NIST funding).

You are right that Argon2id would be preferable. However the way to make that happen is to push it to the PHP community and not the WP community. md5 however, has been deprecated well over a decade and @johnbillion's approach is the simplest, most recommended way to implement bcrypt.

@soatok
Copy link

soatok commented Nov 26, 2024

Stop it @soatok. You're undermining a clear improvement on md5 with incorrect information:

  1. It's generally considered rude to tell strangers to stop "it". Whatever "it" is, in this context. You're nobody's boss here.
  2. Regardless of the choice (which is a one-way door): bcrypt with SHA+base64, or plain bcrypt both materially improve over phpass.

Even if WordPress decides against my recommendations, it's worth discussing the facts so it's an informed decision.

I had rested my case until you came along and posted falsehoods. (Accusing me of "incorrect information" is funny.)

SHA hashing a password does not increase entropy beyond the original password's.

I didn't say it did.

Using SHA2 to hash a password permutes the bits through the entire hash output, which means the reduction of the base64-encoded hash output is not a meaningful security reduction. That is my argument.

I made no claims about the entropy of the input domain, only the output.

SHA may randomize the characters but there's a symbolic (1:1) correlation between those characters and a low-entropy password.

This is a meaningless distinction, with my previous statement in mind.

Clipping the length of a base64'd string does matter. It reduces the space to store the shortened SHAs of all 10-char passwords.

I genuinely do not understand this argument.

If you take a raw SHA-512 hash and encode it with base64, there is no entropy loss due to the encoding of the raw hash, and the truncation of such a large hash still has a sufficiently enormous probability space that it doesn't help attackers.

The best possible attack against a truncated SHA-512 is a collision attack. I provided the risk for bcrypt-sha512+base64: After 2^144 passwords, there is a 2^-144 probability of a collision.

For vanilla bcrypt, it's 2^192 for 2^-192.

If you really want to argue that 2^-192 matters more than 2^-144, I invite you to write a paper for your argument and submit it to a journal for an IACR-associated event (e.g., Real World Cryptography) and then share your inevitable rejection letter publicly.

Analyzing how the algorithms transform data can tell you if there is a meaningful security reduction in the steps being taken. The short answer is: No. The truncation of an encoded SHA-512 hash is still longer (and of the same quality) as a SHA-384 hash, which is what the US government recommends in CNSA. If we were truncating to less than 256 bits, there might be cause for concern, but that's not the recommendation here.

A weak, 10-character password is just as weak whether or not you use SHA-512+base64 as described above. No meaningful weakness is introduced by this process, and it prevents one that has broken real-world systems.

Okta's breach was caused by using bcrypt outside it's design params as a cache key generator. They prefixed heavily in a way that consumed most of the 72 bytes in bcrypt's input. This PR does the right thing and uses password-only as input and without hashing, as recommended by both OWASP and NIST.

The point of writing misuse-resistant cryptography (which is what the bcrypt with SHA-512+base64 is) is that you cannot envision all of the possible ways that an API will be used.

Okta previously used bcrypt in a way that wasn't recommended by experts. Developers do this all the time. See also: JWT libraries that accept alg=none.

If someone decides to write a WordPress plugin that uses the WordPress password hashing code for application passwords, and encodes the passwords as email address || password, and users supply a very long email address:

  • With phpass, their passwords won't be hashed with a memory-hard password hash which sucks. That's the situation today.
  • With bcrypt, changing the WordPress password hashing class to vanilla bcrypt will introduce a vulnerability that didn't previously exist, due to the 72 character truncation.
  • With bcrypt-SHA512+base64, you get the best of both worlds.

You're arguing for the second state, I'm saying the third is the best of both worlds.

In order for it to be correct that I'm "undermining a clear improvement on md5 with incorrect information", I would have to a) be arguing for phpass to continue and b) be stating falsehoods.

Regarding cryptographers, NIST does approve of bcrypt and will for several more years.

This is incorrect. Bcrypt is not a NIST recommend algorithm, and you cannot use it in a FIPS module. You have to use PBKDF2 with a NIST-approved hash function.

NIST does vaguely recommend a "memory hard" function, but they go on to reference Balloon, not bcrypt.

(...it's looking to be past 2029 before FIPS 140-3 validates any modules that might not include bcrypt. More like 2032-2035 if this DOGE department cuts NIST funding).

Are you sure about that?

You are right that Argon2id would be preferable. However the way to make that happen is to push it to the PHP community and not the WP community.

Right. It's a trade-off. I'm not pushing for Argon2id today, I'm telling you what cryptographers prefer today. I know this because cryptography is my job.


Aside: I'm not the first person to recommend this construction. It's had nearly a decade of scrutiny, even if you only look at the PHP community.

It's comical to suggest that vanilla bcrypt is safer than the construction I recommend.

@pkevan
Copy link

pkevan commented Nov 26, 2024

Testing
There's good test coverage for this change. Here are some manual testing steps that can be taken.

Remaining logged in after the update

We should try testing this with wp session too, both the standard version and any extended implementation if possible - the expectation is that they aren't affected (user would remain logged in), but it wouldn't harm to check.

Note that when a user logs in, WordPress already writes to the database via an UPDATE query on the usermeta table to store their updated user session information in the session_tokens meta field. This happens every time a user logs in.

@johnbillion and I also discussed the potential result of this on large datasets, and assumed it wouldn't result in much higher load, and that a bulk process to update this wouldn't be possible due to how passwords are stored/retrieved. More view on this welcome though!

@Synchro
Copy link

Synchro commented Nov 26, 2024

Another example for reference is laravel/framework#49752 , where they seem to have decided to do nothing.

I made this Laravel PR to put a bcrypt length check into a validation step, rather than having it just truncate silently, but that was rejected too.

@soatok
Copy link

soatok commented Nov 26, 2024

Meta comment: I've updated my comment earlier in the thread to clarify one of my statements. I wanted to call attention to this in case it is overlooked.

@dd32
Copy link
Member

dd32 commented Nov 28, 2024

Having read and reviewed this entire PR, I agree with @johnbillion here with retaining the original bcrypt implementation and limitations. Initially I was onboard with SHA384+bcrypt, but after reading, I agree with just bcrypt.

I would suggest that if using bcrypt for >72char is deemed inappropriate, perhaps it should fall back to the current phpass stretched-md5 hashing method, especially if no consensus can be achieved.

That would result in maximum compatibility for existing WordPress users who use the Password hashes outside of WordPress, while also not introducing yet-another-custom-hash into the web where it's not overly obviously necessary, but while still gaining the bcrypt advantages for where it's possible.

@soatok
Copy link

soatok commented Nov 28, 2024

I would suggest that if using bcrypt for >72char is deemed inappropriate, perhaps it should fall back to the current phpass stretched-md5 hashing method, especially if no consensus can be achieved.

I strongly advise against doing this.

While it's generally assumed that the entropy of a 72+ character password is sufficiently good that nobody is going to crack them, even if it's a weaker password hash like phpass, doing so would kill any chance of a world where everyone is on bcrypt or better.

If possible, I'd suggest trying to measure if this is a real problem before committing to it.

/**
 * Is this string longer than the target length?
 * Avoids side-channels.
 * @param string $input
 * @param int $targetLength
 * @return bool true if the length of  $input exceeds $targetLength
 */
function isLongerThanCT(string $input, int $targetLength): bool
{
    $amount = PHP_INT_SIZE === 8 ? 63 : 31;
    $len = strlen($input); // use mb_strlen($input, '8bit'); on older PHP
    $res = (($targetLength - $len) >> $amount) & 1;
    /*
     * It's worth taking a moment to explain what the hell the line above is doing.
     * 
     * ($targetLength - $len) will return a negative value if $targetLength is smaller than $len.
     *
     * By two's complement, negative values when shifted 31 or 63 places to the right will have a lower
     * bit set. We then use a bit mask of `1` and the bitwise `&` operator on the result of the bitshift.
     *
     * End result? It's the same as if we did the following, except constant-time:
     * $res = (int) ($len > $targetLength);
     */
    return (bool) ($res);
}

This function returns true if the string has a length greater than or equal to the target length. This calculation is performed without branches or inequality operators and should therefore have a constant-time execution.

You can use this to measure if the length of users' passwords exceeds some threshold.

Demo: https://3v4l.org/BZK5d

Separately, I wrote a blog post exploring the details of bcrypt with SHA2 titled, Beyond Bcrypt.

@mbijon
Copy link

mbijon commented Nov 30, 2024

Please @dd32 let’s not fallback to any form of md5 hashing. The form of md5 stretching in PHPass isn’t built to consume time and can’t be configured to extend the time (rounds). Also, it creates a herd vulnerability on large sites like wordpress.com or WPEngine because it doesn’t have per-user salts (it only peppers passwords with what’s named a “SALT” in wp-config). Plus, there are already enough vulnerabilities in md5 for it to be considered broken since 2008.

Any not-broken form of bcrypt password storage (ie: not like Okta’s old implementation) is more secure than any form of md5 storage.

@raphaelahrens
Copy link

FYI OWASP/CheatSheetSeries#1551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants