Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rfc6530 #5237

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from
Open

Support rfc6530 #5237

wants to merge 5 commits into from

Conversation

arnt
Copy link

@arnt arnt commented Sep 18, 2023

This adds support for unicode email addresses in is_email and sanitise_email. It is intended to be sufficient to accept unicode email addresses in contact forms etc.,

it does not aspire to unicode support everywhere, in particular it doesn't deal with unicode in logins or passwords.

It also does not add a tests/formatting/sanitizeEmail.php; I wouldn't mind adding one, but don't quite understand your testing policies. It looks like you want to keep the tests fast by avoiding unnecessary tests, I'd say, but I don't understand what you consider necessary and unnecessary. Anyway, if you want a file I'm happy to add one.

Trac ticket: https://core.trac.wordpress.org/ticket/31992

@arnt arnt force-pushed the support-rfc6530 branch 2 times, most recently from 18ce341 to c8587f1 Compare September 18, 2023 12:13
@arnt
Copy link
Author

arnt commented Sep 22, 2023

I don't understand the test failures. They appear to be unrelated to these code changes. Could you take a look, @ironprogrammer ? Thanks.

@ironprogrammer
Copy link

In response to @arnt:

It also does not add a tests/formatting/sanitizeEmail.php; I wouldn't mind adding one...

Absolutely, please do add unit tests (or updates to existing tests) for sanitize_email(). While Core test code coverage is pretty low today 😓, it is a project policy that new Core merges include appropriate tests.

I don't understand the test failures. They appear to be unrelated to these code changes. Could you take a look, @ironprogrammer ? Thanks.

It's not uncommon for E2E test timeouts to occur in this repo's automated jobs. These can be ignored.

@ironprogrammer
Copy link

Thanks for the updated PR! The updates to sanitize_email() appear to have resolved the issue with storing the correct email in the database 🎉.

This isn't a formal test report (there are lots of scenarios needed), but I ran through a few smoke tests so as to get back to you quickly on what could be addressed next. It would be great to get additional reviewers/testers involved to cover what I've missed:

  • ✅ Add New User: email with unicode characters is accepted and saved to the database correctly.
  • ✅ Profile (edit user): email can be changed with unicode characters, and when saved, persists correctly.
  • ❓ Do new user creation, email change confirmation, or password change emails go to the correct address?
  • ❓ Do links in these emails work correctly?
  • ❓ Does the forgotten password workflow work correctly with a unicode address?
  • ✅ Login: logging in with a unicode email works as expected.
  • ✅ Add post comment: email with unicode is accepted for post comments.
  • ❌ Comments (approval list): emails with unicode are not displayed correctly (Figure 1).
  • ✅ In the Comments list, the Email field under Quick Edit does appear correctly.
  • ❓ Site setup wizard: can the initial admin use a unicode address, and do the emails work correctly?
  • ❓ Do various author dropdowns display unicode addresses as expected?
  • ❓ Are unicode emails in feeds presented accurately?

Figure 1: Display of commenter unicode email addresses.
comments approval screen with incorrectly displayed unicode email addresses

@arnt
Copy link
Author

arnt commented Sep 27, 2023

Sigh. I see what you mean.

My own focus is/was quite different: I wanted Contact Form 7 and similar to accept unicode email addresses, which requires some utility functions in Wordpress to do the right thing. I didn't aspire to complete support in Wordpress. The thinking behind your response seems to be that either Wordpress supports it properly or not at all, there's no inbetween.

And, sadly, I think you're right and I was wrong.

I'll extend the PR. It'll take a few weeks. I'll also find some testers.

@arnt
Copy link
Author

arnt commented Sep 16, 2024

Hi,

you asked for wider testing. Around 20 sites have now run the patch in production, the longest-running one for more than six months already. Nothing has broken. The sites are real (University of Somewhere, not someone's private five-visitor site).

If you want, I can send you the email addresses of some testers. When I pinged them last week, around half said it was okay to give you their addresses, I'm sure most will say the same if I ask a few times ;)

Copy link

github-actions bot commented Sep 16, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props agulbra, ironprogrammer.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@arnt
Copy link
Author

arnt commented Sep 16, 2024

I liked the accounts earlier today.

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@ironprogrammer
Copy link

Thanks for the update, @arnt, and for seeking wider testing! There are still some outstanding questions and one previous failure to consider, but it's great to see this moving forward! 🙌🏻

A good next step would be to rebase this PR on the latest trunk, and to resolve the coding standards issues to tidy things up for an updated review and testing. When the CI checks are in a happy state, then it will help attract more attention.

@arnt
Copy link
Author

arnt commented Sep 23, 2024

Finding testers who were willing and able to run a patch on sites that mattered to them was hard work. That took a good long while. Do you want their names? If so, please send me mail.

Rebasing is not a problem, of course. The other stuff was a little difficult. I only recently found a good workaround for IntlChar's lack of support for UAX24.

Can you suggest a model end-to-end test that I could emulate and write automatic tests for the desired features? Manual testing is possible, but I'd much prefer to have complete automatic test coverage.

arnt added 2 commits December 3, 2024 15:03
This adds support for the unicode address extensions in RFC 6532, adds
unit tests for that, extends the documentation to explain the relationship
between this code and the various specifications, and finally adds unit
tests to ensure that the documentation's description of the code remains
correct.

Fixes #31992.
@arnt
Copy link
Author

arnt commented Dec 3, 2024

Hi,

the remaining problems, as far as I can tell:

  1. The code now requires PHP 7.4, which I thought was OK but now I see that 7.4 is merely recommended, not required. Oops. Not clear to me how to do mb_str_split in 7.2.
  2. I've clicked around, but don't feel sure that I have all of the "various dropdowns". The filters in default-filters.php are okay, but I don't feel confident that every third-party plugin will be okay, or that I've looked at everything in the UI. Wordpress is large, there are nooks and corners…

arnt added 3 commits December 4, 2024 09:50
This accepts user names that contain a single script, but not mixed-script
names, such as ones that mix Latin and Cyrillic. That seemed to be closest
to the code's existing philosophy.

Since PHP 7.2 and 7.3 don't offer mb_str_split, this change leaves sites
on those versions of PHP with ASCII-only user names. Existing user names
should continue to work unchanged, adding new (non-ASCII) user names will
not work.
Until now, WordPress generated all-ASCII slugs, which works well for
mostly-ASCII languages. Now that WordPress supports scripts like
Devanagari much better, it's necessary to test the slug support for
Devanagari. What WordPress did until now works well for Devanagari, so
this commit merely adds a unit test.

The ideal is to have links such as
	<hebrewdomain>.<hebrewtld>/<hebrewpath>
which is shown onscreen as
	<hebrewpath>/<hebrewtld>.<hebrewdomain>
without any switches between right-to-left Hebrew and left-to-right Latin.
/author/ and a few others don't achieve that ideal. Yet.

Note that there is a rare existing bug: The existing mapping to ASCII can
produce the same slug for two pages, since two letters can map to e.g.
'L'. This seems unfixable, since any change might break existing links,
but happily also extremely rare.
@arnt
Copy link
Author

arnt commented Dec 4, 2024

I added some code to prevent PHP 7.2.24-7.3.x from choking on mb_str_split().

A bit on testing:

  • ✅ Add New User: email with unicode characters is accepted and saved to the database correctly.
  • ✅ Profile (edit user): email can be changed with unicode characters, and when saved, persists correctly.
  • ✅ New user creation, email change confirmation, and password change emails go to the correct address.
  • ✅ Links in all messages work correctly AFAICT (plugins can add new kinds of course…).
  • ✅ The forgotten password workflow works correctly with a unicode address.
  • ✅ Login: logging in with a unicode email works as expected.
  • ✅ Add post comment: email with unicode is accepted for post comments.
  • ✅ Comments (approval list): emails with unicode are displayed correctly.
  • ✅ In the Comments list, the Email field under Quick Edit does appear correctly.
  • ✅ WordPress works on an IPv6-only server (oh, you weren't wondering?)

And… oh drat… Must retest the site setup wizard, too much code has changed.

Two questions remain.

  1. What author dropdowns display email addresses? I can't find anything. Lots of user names displayed but not email addresses. Maybe I misunderstand. I checked for usage of email in the code that reads the database and use of that code in display code.
  2. How can email addresses ever appear in feeds? The feed generators don't seem to reveal authors etc.

@arnt
Copy link
Author

arnt commented Dec 4, 2024

Just set up a new site from scratch using 'rød@grå.org' as admin address; it works.

@arnt
Copy link
Author

arnt commented Jan 3, 2025

Hi @ironprogrammer,

would you mind having a look?

Also, on an unrelated matter, I'm with you. I've worked as a full-time maintainer on a widely used open source project and got a lot of flak. It hurt. Some people can be really shitty. Don't let it grind you down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants