Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Provide mechanism to scan all tokens in an HTML document, not only the tags. #5683

Closed
wants to merge 1 commit into from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Nov 17, 2023

Trac ticket: Core-60170
Companion port into Gutenberg: WordPress/gutenberg#58107 (contains additional porting code)

This PR provides full tokenization scanning of an HTML document. This is being added into the Tag Processor and will be a necessary component for a number of related changes to the HTML API:

  • Reading and modifying inner and outer content.
  • Serializing HTML.
  • Methods to transform text content while preserving or stripping away markup.

Enables syntax-aware processing such as wp_truncate_html() [gist]
Replaces/incorporates chunked/extended processing in #5050
Replaces/incorporates stopping at comments in dmsnell#7
Provides critical functionality for inner/outer getter/setter in [dmsnell#10, #4965]

Depends on #5721
Depends on #5725

Todo

  • Change CDATA sections and PI Nodes into comments with a new "comment type" flag.
  • Ensure HTML Processor can seek around without messing up.
  • Review all $this->bytes_already_parsed assignments and make sure they are proper. I think half of them are one off.
  • Add function docblocks.
  • Review what this enables in the html5lib test suite.
  • Use this internally in the HTML Processor to ensure that breadcrumbs are generated.
  • Explore using combinable bit flags for the token types instead of string constants. This would allow for things like MATCHED_TAG | TEXT_NODE and INCOMPLETE | COMPLETE, which could simplify some logic that's spread in if statements.
    • For this PR it's not worth moving to boolean logic like this. The existing code is clearer for review.
  • Add test suite.
  • Distinguish too-short HTML comments that may cause trouble when modifying. E.g. <!--->.
  • Discuss what to do about PI Nodes and CDATA sections
    • It's possible to identify these after identifying the bogus comment span, but we can't look for the ending syntax of these sections because HTML stipulates that they end at the first >, not the closing ]]> or the closing ?>. So we can find all HTML comments, and then determine if they would have been a CDATA or PI Node if HTML supported those.
    • We can also ignore them all, but we lose knowledge about the HTML stream that we could recover (e.g. distinguish <?for-each?> from <--for-each-->.

Design Changes

In this change we're introducing two features stemming from two internal changes:

  • next_token() provides the ability to scan every token in the HTML stream.
  • it is possible to parse HTML in chunks without having the entire document in memory.

The internal changes powering this are:

  • internal state adopts a new parsing mode which allows resuming from the middle of a previous match.
  • the new concept of modifiable text and a token proper tracks the bounds of the currently-matched token as well as a safe region that can be changed without impacting the document syntax, if one exists.

For example, when encountering an HTML comment the parser will track the following token information:

This <!-- is a comment -->.
     │   │            │  └ end of token
     │   │            └─── end of text
     │   └──────────────── start of text
     └──────────────────── start of token

Not every token will have a text region, but it's important to track the entire token and any text region because similar tokens may have different syntax. For example, an invalid comment is still a comment.

This <? is also a comment --!>.
     │ │                 │   └ end of token
     │ │                 └──── end of text
     │ └────────────────────── start of text
     └──────────────────────── start of token

This holds for tokens whose entire content is text, such as with the #text node.

<div>This is text.</div>
     │           ├ end of token
     │           └ end of text
     ├──────────── start of text
     └──────────── start of token

Special HTML tags have modifiable text and that isn't part of .textConent or .innerText. For example, the TITLE element contains no HTML inside of it and everything is plaintext and its contents don't appear in the page. The same is true for TEXTAREA and SCRIPT and STYLE and a few more elements.

<title>This is text <em>Not HTML</em>.</title>
│      │                             │       └ end of token
│      └ start of text               └ end of text
└──────────── start of token

Scanning tokens

In order to keep the next_tag() interface and use clear, it is left unchanged. For operations needing access to the token stream, there is no built-in query mechanism and querying ought to be performed inside a next_token() loop. get_token_type() indicates what kind of token is currently matched, get_token_name() returns something that more closely matches what a DOM API would return, and get_modifiable_text() returns the modifiable text if available.

function wp_strip_all_tags( $html, $remove_breaks ) {
	$processor = new WP_HTML_Processor( $html );

	$text_content = '';
	while ( $processor->next_token() ) {
		if ( '#text' === $processor->get_token_name() ) {
			$text_content .= $processor->get_node_text();
		}
	}

	return $remove_breaks
		? trim( preg_replace( '/[\r\n\t ]+/', ' ', $text_content ) )
		: $text_content;
}
  • Most tags have no modifiable content.
  • The inner contents of special tags whose contents cannot contain HTML is their modifiable content. The inner contents of these tags are not rendered in the page.
    • IFRAME
    • NOEMBED, NOFRAMES
    • SCRIPT
    • STYLE
    • TEXTAREA [character references are decoded]
    • TITLE [character references are decoded]
    • XMP

TODO

  • Add next_token() method to scan each token.
  • Stop at RCDATA sections and SCRIPT, STYLE, TITLE, TEXTAREA, etc…
  • Stop at text nodes.
  • Indicate a continuation state to support resumable parsing. This is necessary for stopping at SCRIPT tags and other tags with special closing rules. These are currently handled by skipping to the end of the element when finding the starting tag, but this has introduced a few challenges and bugs (for example, the Tag Processor fails to stop at a <title> tag if the document ends before the </title> closer is found).
  • Add rewind() method to reverse to the start of the document.

@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch from 3dede00 to a25b57a Compare November 28, 2023 21:45
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Nov 30, 2023
…, end)

This patch follows-up with earlier design questions around how to represent
spans of strings inside the class. It's relevant now as preparation for WordPress#5683.

The mixture of (offset, length) and (start, end) coordinates becomes confusing
at times and all final string operations are performed with the (offset, length)
pair, since these feed into `strlen()`.

In preparation for exposing all tokens within an HTML document this change:
 - Unifies the representation throughout the class.
 - It creates `token_starts_at` to track the start of the current token.
 - It replaces `tag_ends_at` with `token_length` for re-use with other token types.

There should be no functional or behavioral changes in this patch.

For the internal helper classes this patch introduces breaking changes, but those
classes are marked private and should not be used outside of the HTML API itself.
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Dec 10, 2023
…, end)

This patch follows-up with earlier design questions around how to represent
spans of strings inside the class. It's relevant now as preparation for WordPress#5683.

The mixture of (offset, length) and (start, end) coordinates becomes confusing
at times and all final string operations are performed with the (offset, length)
pair, since these feed into `strlen()`.

In preparation for exposing all tokens within an HTML document this change:
 - Unifies the representation throughout the class.
 - It creates `token_starts_at` to track the start of the current token.
 - It replaces `tag_ends_at` with `token_length` for re-use with other token types.

There should be no functional or behavioral changes in this patch.

For the internal helper classes this patch introduces breaking changes, but those
classes are marked private and should not be used outside of the HTML API itself.
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Dec 10, 2023
…, end)

This patch follows-up with earlier design questions around how to represent
spans of strings inside the class. It's relevant now as preparation for WordPress#5683.

The mixture of (offset, length) and (start, end) coordinates becomes confusing
at times and all final string operations are performed with the (offset, length)
pair, since these feed into `strlen()`.

In preparation for exposing all tokens within an HTML document this change:
 - Unifies the representation throughout the class.
 - It creates `token_starts_at` to track the start of the current token.
 - It replaces `tag_ends_at` with `token_length` for re-use with other token types.

There should be no functional or behavioral changes in this patch.

For the internal helper classes this patch introduces breaking changes, but those
classes are marked private and should not be used outside of the HTML API itself.
@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch 2 times, most recently from 156a31e to c22cd4b Compare December 21, 2023 21:06
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch 2 times, most recently from f19a5cb to cc96ed2 Compare January 1, 2024 03:42
@dmsnell dmsnell marked this pull request as ready for review January 1, 2024 03:43
@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch 2 times, most recently from 51f432c to 103a556 Compare January 11, 2024 03:27
@sirreal

This comment was marked as resolved.

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the shape this is taking. I've left several thoughts comments and questions from this first pass. I'd like to take a look at the processing of comments because I think we can fix that @todo in this PR.

I also want to see what feedback the html5lib tests can give us so I'll take some time to see what it looks like to run them against this PR with additional handling of more node types (one of your todo items in the description).

I haven't gone through everything yet, only the main implementation changes.

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
Comment on lines 312 to 326
* - `#text` nodes, whose entire token _is_ the modifiable text.
* - Comment nodes and nodes that became comments because of some syntax error. The
* text for these nodes is the portion of the comment inside of the syntax. E.g. for
* `<!-- comment -->` the text is `" comment "` (note that the spaces are part of it).
* - `CDATA` sections, whose text is the content inside of the section itself. E.g. for
* `<![CDATA[some content]]>` the text is `"some content"`.
* - "Funky comments," which are a special case of invalid closing tags whose name is
* invalid. The text for these nodes is the text that a browser would transform into
* an HTML when parsing. E.g. for `</%post_author>` the text is `%post_author`.
*
* And there are non-elements which are atomic in nature but have no modifiable text.
* - `DOCTYPE` nodes like `<DOCTYPE html>` which have no closing tag.
* - The empty end tag `</>` which is ignored in the browser and DOM but exposed
* to the HTML API.

This comment was marked as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sirreal I'm not sure yet on what do to here. can we tag this for follow-up after merge?

I'm a bit concerned about using specific property names here because this is supposed to be the explanatory section of the documentation and I don't want to couple the description to our own terms; I want it to read comfortable for someone coming in with an HTML background - that is, leave things a bit loose here to guide an understanding without pinning it to one specific technicality.

nonetheless I've taken another pass at the comment to update it based on how this has developed.

Comment on lines +483 to +497
* | *Text node* | Found a #text node; this is plaintext and modifiable. |
* | *CDATA node* | Found a CDATA section; this is modifiable. |
* | *Comment* | Found a comment or bogus comment; this is modifiable. |

This comment was marked as resolved.

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
Comment on lines +2541 to +2651
case self::STATE_DOCTYPE:
return '#doctype';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return #doctype here? The html value we'd get from get_token_name is confusing but aligns with what the browser does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is specifically to expose what kind of token it is. I don't like conflating it with the HTML tag name for an Element, even though one is lower-case and the other upper-case.

in my own explorations I found it helpful to have both functions: one to say the node name like the browser would, and one to say the node type (also like how the browser would). I've also been trying to balance the use of longer constants against cearly-searchable text values since this is a more consumer-oriented function.

switch ( $processor->get_token_type() ) {
	case WP_HTML_Processor::NODE_TYPE_DOCUMENT_TYPE:
	case '#doctype':
		…
}

at this point I'm assuming people will use the string value even if the constant exists. also I started with get_node_type() and get_node_name() but then renamed to _token_ because I wanted to support a slightly different set of kinds; I'm doubting this since discovering the challenge of partial documents with invalid comment syntaxes, but haven't completely abandoned the idea yet, particularly because of the support for presumptuous tags and funky comments, which aren't in the DOM API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this makes sense 👍

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch 2 times, most recently from 9f29920 to 1098c19 Compare January 15, 2024 17:22
Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is ready to merge.

Comment on lines 318 to 331
$processor = WP_HTML_Processor::create_fragment( '<![CDATA[this is a comment]]>' );
$processor->next_token();

$this->assertSame(
'#cdata-section',
$processor->get_token_name(),
"Should have found CDATA section name but found {$processor->get_token_name()} instead."
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to get hung up on CDATA and PI handling, this doesn't block merging this PR.

I think this behavior is what you described in Slack here:

…it finds those comments (to the first >), and then if it ends in ]]> and starts with <![CDATA[ we can safely say, "this is a CDATA node" … though the actual rules for those are more complicated and we can only support a subset now

We can discuss this more in a follow-up, but I'm reluctant to diverge from the specification. This is not a cdata-section with the text content this is a comment (unless we were in svg or math foreign content), this is a comment with the text content [CDATA[this is a comment]].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm torn a bit but also I find that things are slightly different since we're not building a DOM here. when considering the intentionality behind some HTML string, I think it's evident that if someone writes <![CDATA[something]]> then they clearly meant to product what they consider a CDATA section, and WordPress itself still creates these for legacy reasons (even though it may be the case outside of WordPress's XML outputs that those aren't needed anymroe).

so this does conflate with a comment whose text is [CDATA[this is a comment]], but if we only indicate that we have also lost the ability to differentiate these two strings, which in my opinion have divergent histories and intents:

  • <!--[CDATA[this is a comment]]-->
  • <![CDATA[this is a comment]]>

what I see as the potential failure here is that we hold fixed a comment structure someone can't get rid of because we're only allowing adjustment inside the [ and ], but ultimately in the browser they both disappear as comments.

the case I was far more concerned with is the one we fixed, which is when we think that the inner text is 5 > 3 or [CDATA[5 > 3]] when in fact it's truly 5 or [CDATA[5 since these represent a divergence in token boundaries from the browser (which we still have somewhat at play inside foreign elements).

I'm having similar vibes about representing <!--> and <!---> because right now we're not exposing those as changeable comments. again, someone might miss these because of the representation, but they won't cause the parser to get off track and they won't change the rendered view of the page.

let's keep talking because I'd like to push this as far as possible. I really want it to work that we expose these as separate entities. a possible compromise is to maintain a separate indicator specifying type_of_comment which could be BOGUS_COMMENT, CDATA, VALID_COMMENT, etc…, but that also introduces more API surface so I want to have a good feeling that it's necessary before putting it there.

);

$processor->next_token();

This comment was marked as resolved.

Comment on lines +2541 to +2651
case self::STATE_DOCTYPE:
return '#doctype';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this makes sense 👍

@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch from 0ca080a to f502153 Compare January 16, 2024 15:36
*
* <!-->
* <!--->
* <!---->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not abruptly closed, we have start <!-- and end -->, with empty text content.

Here are two examples from the html5lib-tests. There's no comment error with <!---->, but with <!---> there's an "abrupt-closing-of-empty-comment" error.

Suggested change
* <!---->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. the comment was wrong but the code appears to have been good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had a last-minute mini-panic thinking that we aren't detecting <!---!> as an abruptly-closed comment, but the Tag Processor is already right! it's not, and the comment continues. thankfully the code in this branch and in trunk handles it properly

@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch 2 times, most recently from 5065bee to 7d9786d Compare January 23, 2024 05:21
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 23, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <[email protected]>
Co-authored-by: Jon Surrell <[email protected]>
Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed all of the recent changes and they look good to me. I left a few suggestions and I want to make sure we add LISTING to the special handling that removes starting newlines for PRE and TEXTAREA content.

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved
@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch 3 times, most recently from faf9cef to 9d01322 Compare January 24, 2024 21:38
Since its introduction in WordPress 6.2 the HTML Tag Processor has
provided a way to scan through all of the HTML tags in a document and
then read and modify their attributes. In order to reliably do this, it
also needed to be aware of other kinds of HTML syntax, but it didn't
expose those syntax tokens to consumers of the API.

In this patch the Tag Processor introduces a new scanning method and a
few helper methods to read information about or from each token. Most
significantly, this introduces the ability to read `#text` nodes in the
document.

What's new in the Tag Processor?
================================

 - `next_token()` visits every distinct syntax token in a document.
 - `get_token_type()` indicates what kind of token it is.
 - `get_token_name()` returns something akin to `DOMNode.nodeName`.
 - `get_modifiable_text()` returns the text associated with a token.
 - `get_comment_type()` indicates why a token represents an HTML comment.

Example usage.
==============

```php
function strip_all_tags( $html ) {
        $text_content = '';
        $processor    = new WP_HTML_Tag_Processor( $html );

        while ( $processor->next_token() ) {
                if ( '#text' !== $processor->get_token_type() ) {
                        continue;
                }

                $text_content .= $processor->get_modifiable_text();
        }

        return $text_content;
}
```

What changes in the Tag Processor?
==================================

Previously, the Tag Processor would scan the opening and closing tag of
every HTML element separately. Now, however, there are special tags
which it only visits once, as if those elements were void tags without
a closer.

These are special tags because their content contains no other HTML or
markup, only non-HTML content.

 - SCRIPT elements contain raw text which is isolated from the rest of
   the HTML document and fed separately into a JavaScript engine. There
   are complicated rules to avoid escaping the script context in the HTML.
   The contents are left verbatim, and character references are not decoded.

 - TEXTARA and TITLE elements contain plain text which is decoded
   before display, e.g. transforming `&amp;` into `&`. Any markup which
   resembles tags is treated as verbatim text and not a tag.

 - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the
   textarea and title elements, but no character references are decoded.
   For example, `&amp;` inside a STYLE element is passed to the CSS engine
   as the literal string `&amp;` and _not_ as `&`.

Because it's important not treat this inner content separately from the
elements containing it, the Tag Processor combines them when scanning
into a single match and makes their content available as modifiable
text (see below).

This means that the Tag Processor will no longer visit a closing tag for
any of these elements unless that tag is unexpected.

    <title>There is only a single token in this line</title>
    <title>There are two tokens in this line></title></title>
    </title><title>There are still two tokens in this line></title>

What are tokens?
================

The term "token" here is a parsing term, which means a primitive unit in
HTML. There are only a few kinds of tokens in HTML:

 - a tag has a name, attributes, and a closing or self-closing flag.
 - a text node, or `#text` node contains plain text which is displayed
   in a browser and which is decoded before display.
 - a DOCTYPE declaration indicates how to parse the document.
 - a comment is hidden from the display on a page but present in the HTML.

There are a few more kinds of tokens that the HTML Tag Processor will
recognize, some of which don't exist as concepts in HTML. These mostly
comprise XML syntax elements that aren't part of HTML (such as CDATA and
processing instructions) and invalid HTML syntax that transforms into
comments.

What is a funky comment?
========================

This patch treats a specific kind of invalid comment in a special way.
A closing tag with an invalid name is considered a "funky comment." In
the browser these become HTML comments just like any other, but their
syntax is convenient for representing a variety of bits of information
in a well-defined way and which cannot be nested or recursive, given
the parsing rules handling this invalid syntax.

 - `</1>`
 - `</%avatar_url>`
 - `</{"wp_bit": {"type": "post-author"}}>`
 - `</[post-author]>`
 - `</__( 'Save Post' );>`

All of these examples become HTML comments in the browser. The content
inside the funky content is easily parsable, whereby the only rule is
that it starts at the `<` and continues until the nearest `>`. There
can be no funky comment inside another, because that would imply having
a `>` inside of one, which would actually terminate the first one.

What is modifiable text?
========================

Modifiable text is similar to the `innerText` property of a DOM node.
It represents the span of text for a given token which may be modified
without changing the structure of the HTML document or the token.

There is currently no mechanism to change the modifiable text, but this
is planned to arrive in a later patch.

Tags
====

Most tags have no modifiable text because they have child nodes where
text nodes are found. Only the special tags mentioned above have
modifiable text.

    <div class="post">Another day in HTML</div>
    └─ tag ──────────┘└─ text node ─────┘└────┴─ tag

    <title>Is <img> &gt; <image>?</title>
    │      └ modifiable text ───┘       │ "Is <img> > <image>?"
    └─ tag ─────────────────────────────┘

Text nodes
==========

Text nodes are entirely modifiable text.

    This HTML document has no tags.
    └─ modifiable text ───────────┘

Comments
========

The modifiable text inside a comment is the portion of the comment that
doesn't form its syntax. This applies for a number of invalid comments.

    <!-- this is inside a comment -->
    │   └─ modifiable text ──────┘  │
    └─ comment token ───────────────┘

    <!-->
    This invalid comment has no modifiable text.

    <? this is an invalid comment -->
    │ └─ modifiable text ────────┘  │
    └─ comment token ───────────────┘

    <[CDATA[this is an invalid comment]]>
    │       └─ modifiable text ───────┘ │
    └─ comment token ───────────────────┘

Other token types also have modifiable text. Consult the code or tests
for further information.
@dmsnell dmsnell force-pushed the html-api/scan-all-tokens branch from 9d01322 to 30991d7 Compare January 24, 2024 21:47
pento pushed a commit that referenced this pull request Jan 24, 2024
Since its introduction in WordPress 6.2 the HTML Tag Processor has
provided a way to scan through all of the HTML tags in a document and
then read and modify their attributes. In order to reliably do this, it
also needed to be aware of other kinds of HTML syntax, but it didn't
expose those syntax tokens to consumers of the API.

In this patch the Tag Processor introduces a new scanning method and a
few helper methods to read information about or from each token. Most
significantly, this introduces the ability to read `#text` nodes in the
document.

What's new in the Tag Processor?
================================

 - `next_token()` visits every distinct syntax token in a document.
 - `get_token_type()` indicates what kind of token it is.
 - `get_token_name()` returns something akin to `DOMNode.nodeName`.
 - `get_modifiable_text()` returns the text associated with a token.
 - `get_comment_type()` indicates why a token represents an HTML comment.

Example usage.
==============

{{{
<?php
function strip_all_tags( $html ) {
        $text_content = '';
        $processor    = new WP_HTML_Tag_Processor( $html );

        while ( $processor->next_token() ) {
                if ( '#text' !== $processor->get_token_type() ) {
                        continue;
                }

                $text_content .= $processor->get_modifiable_text();
        }

        return $text_content;
}
}}}

What changes in the Tag Processor?
==================================

Previously, the Tag Processor would scan the opening and closing tag of
every HTML element separately. Now, however, there are special tags
which it only visits once, as if those elements were void tags without
a closer.

These are special tags because their content contains no other HTML or
markup, only non-HTML content.

 - SCRIPT elements contain raw text which is isolated from the rest of
   the HTML document and fed separately into a JavaScript engine. There
   are complicated rules to avoid escaping the script context in the HTML.
   The contents are left verbatim, and character references are not decoded.

 - TEXTARA and TITLE elements contain plain text which is decoded
   before display, e.g. transforming `&amp;` into `&`. Any markup which
   resembles tags is treated as verbatim text and not a tag.

 - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the
   textarea and title elements, but no character references are decoded.
   For example, `&amp;` inside a STYLE element is passed to the CSS engine
   as the literal string `&amp;` and _not_ as `&`.

Because it's important not treat this inner content separately from the
elements containing it, the Tag Processor combines them when scanning
into a single match and makes their content available as modifiable
text (see below).

This means that the Tag Processor will no longer visit a closing tag for
any of these elements unless that tag is unexpected.

{{{
    <title>There is only a single token in this line</title>
    <title>There are two tokens in this line></title></title>
    </title><title>There are still two tokens in this line></title>
}}}

What are tokens?
================

The term "token" here is a parsing term, which means a primitive unit in
HTML. There are only a few kinds of tokens in HTML:

 - a tag has a name, attributes, and a closing or self-closing flag.
 - a text node, or `#text` node contains plain text which is displayed
   in a browser and which is decoded before display.
 - a DOCTYPE declaration indicates how to parse the document.
 - a comment is hidden from the display on a page but present in the HTML.

There are a few more kinds of tokens that the HTML Tag Processor will
recognize, some of which don't exist as concepts in HTML. These mostly
comprise XML syntax elements that aren't part of HTML (such as CDATA and
processing instructions) and invalid HTML syntax that transforms into
comments.

What is a funky comment?
========================

This patch treats a specific kind of invalid comment in a special way.
A closing tag with an invalid name is considered a "funky comment." In
the browser these become HTML comments just like any other, but their
syntax is convenient for representing a variety of bits of information
in a well-defined way and which cannot be nested or recursive, given
the parsing rules handling this invalid syntax.

 - `</1>`
 - `</%avatar_url>`
 - `</{"wp_bit": {"type": "post-author"}}>`
 - `</[post-author]>`
 - `</__( 'Save Post' );>`

All of these examples become HTML comments in the browser. The content
inside the funky content is easily parsable, whereby the only rule is
that it starts at the `<` and continues until the nearest `>`. There
can be no funky comment inside another, because that would imply having
a `>` inside of one, which would actually terminate the first one.

What is modifiable text?
========================

Modifiable text is similar to the `innerText` property of a DOM node.
It represents the span of text for a given token which may be modified
without changing the structure of the HTML document or the token.

There is currently no mechanism to change the modifiable text, but this
is planned to arrive in a later patch.

Tags
====

Most tags have no modifiable text because they have child nodes where
text nodes are found. Only the special tags mentioned above have
modifiable text.

{{{
    <div class="post">Another day in HTML</div>
    └─ tag ──────────┘└─ text node ─────┘└────┴─ tag
}}}

{{{
    <title>Is <img> &gt; <image>?</title>
    │      └ modifiable text ───┘       │ "Is <img> > <image>?"
    └─ tag ─────────────────────────────┘
}}}

Text nodes
==========

Text nodes are entirely modifiable text.

{{{
    This HTML document has no tags.
    └─ modifiable text ───────────┘
}}}

Comments
========

The modifiable text inside a comment is the portion of the comment that
doesn't form its syntax. This applies for a number of invalid comments.

{{{
    <!-- this is inside a comment -->
    │   └─ modifiable text ──────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <!-->
    This invalid comment has no modifiable text.
}}}

{{{
    <? this is an invalid comment -->
    │ └─ modifiable text ────────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <[CDATA[this is an invalid comment]]>
    │       └─ modifiable text ───────┘ │
    └─ comment token ───────────────────┘
}}}

Other token types also have modifiable text. Consult the code or tests
for further information.

Developed in #5683
Discussed in https://core.trac.wordpress.org/ticket/60170

Follows [57575]

Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam
Fixes #60170



git-svn-id: https://develop.svn.wordpress.org/trunk@57348 602fd350-edb4-49c9-b593-d223f7449a82
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Jan 24, 2024
Since its introduction in WordPress 6.2 the HTML Tag Processor has
provided a way to scan through all of the HTML tags in a document and
then read and modify their attributes. In order to reliably do this, it
also needed to be aware of other kinds of HTML syntax, but it didn't
expose those syntax tokens to consumers of the API.

In this patch the Tag Processor introduces a new scanning method and a
few helper methods to read information about or from each token. Most
significantly, this introduces the ability to read `#text` nodes in the
document.

What's new in the Tag Processor?
================================

 - `next_token()` visits every distinct syntax token in a document.
 - `get_token_type()` indicates what kind of token it is.
 - `get_token_name()` returns something akin to `DOMNode.nodeName`.
 - `get_modifiable_text()` returns the text associated with a token.
 - `get_comment_type()` indicates why a token represents an HTML comment.

Example usage.
==============

{{{
<?php
function strip_all_tags( $html ) {
        $text_content = '';
        $processor    = new WP_HTML_Tag_Processor( $html );

        while ( $processor->next_token() ) {
                if ( '#text' !== $processor->get_token_type() ) {
                        continue;
                }

                $text_content .= $processor->get_modifiable_text();
        }

        return $text_content;
}
}}}

What changes in the Tag Processor?
==================================

Previously, the Tag Processor would scan the opening and closing tag of
every HTML element separately. Now, however, there are special tags
which it only visits once, as if those elements were void tags without
a closer.

These are special tags because their content contains no other HTML or
markup, only non-HTML content.

 - SCRIPT elements contain raw text which is isolated from the rest of
   the HTML document and fed separately into a JavaScript engine. There
   are complicated rules to avoid escaping the script context in the HTML.
   The contents are left verbatim, and character references are not decoded.

 - TEXTARA and TITLE elements contain plain text which is decoded
   before display, e.g. transforming `&amp;` into `&`. Any markup which
   resembles tags is treated as verbatim text and not a tag.

 - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the
   textarea and title elements, but no character references are decoded.
   For example, `&amp;` inside a STYLE element is passed to the CSS engine
   as the literal string `&amp;` and _not_ as `&`.

Because it's important not treat this inner content separately from the
elements containing it, the Tag Processor combines them when scanning
into a single match and makes their content available as modifiable
text (see below).

This means that the Tag Processor will no longer visit a closing tag for
any of these elements unless that tag is unexpected.

{{{
    <title>There is only a single token in this line</title>
    <title>There are two tokens in this line></title></title>
    </title><title>There are still two tokens in this line></title>
}}}

What are tokens?
================

The term "token" here is a parsing term, which means a primitive unit in
HTML. There are only a few kinds of tokens in HTML:

 - a tag has a name, attributes, and a closing or self-closing flag.
 - a text node, or `#text` node contains plain text which is displayed
   in a browser and which is decoded before display.
 - a DOCTYPE declaration indicates how to parse the document.
 - a comment is hidden from the display on a page but present in the HTML.

There are a few more kinds of tokens that the HTML Tag Processor will
recognize, some of which don't exist as concepts in HTML. These mostly
comprise XML syntax elements that aren't part of HTML (such as CDATA and
processing instructions) and invalid HTML syntax that transforms into
comments.

What is a funky comment?
========================

This patch treats a specific kind of invalid comment in a special way.
A closing tag with an invalid name is considered a "funky comment." In
the browser these become HTML comments just like any other, but their
syntax is convenient for representing a variety of bits of information
in a well-defined way and which cannot be nested or recursive, given
the parsing rules handling this invalid syntax.

 - `</1>`
 - `</%avatar_url>`
 - `</{"wp_bit": {"type": "post-author"}}>`
 - `</[post-author]>`
 - `</__( 'Save Post' );>`

All of these examples become HTML comments in the browser. The content
inside the funky content is easily parsable, whereby the only rule is
that it starts at the `<` and continues until the nearest `>`. There
can be no funky comment inside another, because that would imply having
a `>` inside of one, which would actually terminate the first one.

What is modifiable text?
========================

Modifiable text is similar to the `innerText` property of a DOM node.
It represents the span of text for a given token which may be modified
without changing the structure of the HTML document or the token.

There is currently no mechanism to change the modifiable text, but this
is planned to arrive in a later patch.

Tags
====

Most tags have no modifiable text because they have child nodes where
text nodes are found. Only the special tags mentioned above have
modifiable text.

{{{
    <div class="post">Another day in HTML</div>
    └─ tag ──────────┘└─ text node ─────┘└────┴─ tag
}}}

{{{
    <title>Is <img> &gt; <image>?</title>
    │      └ modifiable text ───┘       │ "Is <img> > <image>?"
    └─ tag ─────────────────────────────┘
}}}

Text nodes
==========

Text nodes are entirely modifiable text.

{{{
    This HTML document has no tags.
    └─ modifiable text ───────────┘
}}}

Comments
========

The modifiable text inside a comment is the portion of the comment that
doesn't form its syntax. This applies for a number of invalid comments.

{{{
    <!-- this is inside a comment -->
    │   └─ modifiable text ──────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <!-->
    This invalid comment has no modifiable text.
}}}

{{{
    <? this is an invalid comment -->
    │ └─ modifiable text ────────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <[CDATA[this is an invalid comment]]>
    │       └─ modifiable text ───────┘ │
    └─ comment token ───────────────────┘
}}}

Other token types also have modifiable text. Consult the code or tests
for further information.

Developed in WordPress/wordpress-develop#5683
Discussed in https://core.trac.wordpress.org/ticket/60170

Follows [57575]

Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam
Fixes #60170


Built from https://develop.svn.wordpress.org/trunk@57348


git-svn-id: http://core.svn.wordpress.org/trunk@56854 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@dmsnell
Copy link
Member Author

dmsnell commented Jan 24, 2024

Merged in 57348
616e673

@dmsnell dmsnell closed this Jan 24, 2024
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Jan 24, 2024
Since its introduction in WordPress 6.2 the HTML Tag Processor has
provided a way to scan through all of the HTML tags in a document and
then read and modify their attributes. In order to reliably do this, it
also needed to be aware of other kinds of HTML syntax, but it didn't
expose those syntax tokens to consumers of the API.

In this patch the Tag Processor introduces a new scanning method and a
few helper methods to read information about or from each token. Most
significantly, this introduces the ability to read `#text` nodes in the
document.

What's new in the Tag Processor?
================================

 - `next_token()` visits every distinct syntax token in a document.
 - `get_token_type()` indicates what kind of token it is.
 - `get_token_name()` returns something akin to `DOMNode.nodeName`.
 - `get_modifiable_text()` returns the text associated with a token.
 - `get_comment_type()` indicates why a token represents an HTML comment.

Example usage.
==============

{{{
<?php
function strip_all_tags( $html ) {
        $text_content = '';
        $processor    = new WP_HTML_Tag_Processor( $html );

        while ( $processor->next_token() ) {
                if ( '#text' !== $processor->get_token_type() ) {
                        continue;
                }

                $text_content .= $processor->get_modifiable_text();
        }

        return $text_content;
}
}}}

What changes in the Tag Processor?
==================================

Previously, the Tag Processor would scan the opening and closing tag of
every HTML element separately. Now, however, there are special tags
which it only visits once, as if those elements were void tags without
a closer.

These are special tags because their content contains no other HTML or
markup, only non-HTML content.

 - SCRIPT elements contain raw text which is isolated from the rest of
   the HTML document and fed separately into a JavaScript engine. There
   are complicated rules to avoid escaping the script context in the HTML.
   The contents are left verbatim, and character references are not decoded.

 - TEXTARA and TITLE elements contain plain text which is decoded
   before display, e.g. transforming `&amp;` into `&`. Any markup which
   resembles tags is treated as verbatim text and not a tag.

 - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the
   textarea and title elements, but no character references are decoded.
   For example, `&amp;` inside a STYLE element is passed to the CSS engine
   as the literal string `&amp;` and _not_ as `&`.

Because it's important not treat this inner content separately from the
elements containing it, the Tag Processor combines them when scanning
into a single match and makes their content available as modifiable
text (see below).

This means that the Tag Processor will no longer visit a closing tag for
any of these elements unless that tag is unexpected.

{{{
    <title>There is only a single token in this line</title>
    <title>There are two tokens in this line></title></title>
    </title><title>There are still two tokens in this line></title>
}}}

What are tokens?
================

The term "token" here is a parsing term, which means a primitive unit in
HTML. There are only a few kinds of tokens in HTML:

 - a tag has a name, attributes, and a closing or self-closing flag.
 - a text node, or `#text` node contains plain text which is displayed
   in a browser and which is decoded before display.
 - a DOCTYPE declaration indicates how to parse the document.
 - a comment is hidden from the display on a page but present in the HTML.

There are a few more kinds of tokens that the HTML Tag Processor will
recognize, some of which don't exist as concepts in HTML. These mostly
comprise XML syntax elements that aren't part of HTML (such as CDATA and
processing instructions) and invalid HTML syntax that transforms into
comments.

What is a funky comment?
========================

This patch treats a specific kind of invalid comment in a special way.
A closing tag with an invalid name is considered a "funky comment." In
the browser these become HTML comments just like any other, but their
syntax is convenient for representing a variety of bits of information
in a well-defined way and which cannot be nested or recursive, given
the parsing rules handling this invalid syntax.

 - `</1>`
 - `</%avatar_url>`
 - `</{"wp_bit": {"type": "post-author"}}>`
 - `</[post-author]>`
 - `</__( 'Save Post' );>`

All of these examples become HTML comments in the browser. The content
inside the funky content is easily parsable, whereby the only rule is
that it starts at the `<` and continues until the nearest `>`. There
can be no funky comment inside another, because that would imply having
a `>` inside of one, which would actually terminate the first one.

What is modifiable text?
========================

Modifiable text is similar to the `innerText` property of a DOM node.
It represents the span of text for a given token which may be modified
without changing the structure of the HTML document or the token.

There is currently no mechanism to change the modifiable text, but this
is planned to arrive in a later patch.

Tags
====

Most tags have no modifiable text because they have child nodes where
text nodes are found. Only the special tags mentioned above have
modifiable text.

{{{
    <div class="post">Another day in HTML</div>
    └─ tag ──────────┘└─ text node ─────┘└────┴─ tag
}}}

{{{
    <title>Is <img> &gt; <image>?</title>
    │      └ modifiable text ───┘       │ "Is <img> > <image>?"
    └─ tag ─────────────────────────────┘
}}}

Text nodes
==========

Text nodes are entirely modifiable text.

{{{
    This HTML document has no tags.
    └─ modifiable text ───────────┘
}}}

Comments
========

The modifiable text inside a comment is the portion of the comment that
doesn't form its syntax. This applies for a number of invalid comments.

{{{
    <!-- this is inside a comment -->
    │   └─ modifiable text ──────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <!-->
    This invalid comment has no modifiable text.
}}}

{{{
    <? this is an invalid comment -->
    │ └─ modifiable text ────────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <[CDATA[this is an invalid comment]]>
    │       └─ modifiable text ───────┘ │
    └─ comment token ───────────────────┘
}}}

Other token types also have modifiable text. Consult the code or tests
for further information.

Developed in WordPress/wordpress-develop#5683
Discussed in https://core.trac.wordpress.org/ticket/60170

Follows [57575]

Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam
Fixes #60170


Built from https://develop.svn.wordpress.org/trunk@57348


git-svn-id: https://core.svn.wordpress.org/trunk@56854 1a063a9b-81f0-0310-95a4-ce76da25c4cd
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 25, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <[email protected]>
Co-authored-by: Jon Surrell <[email protected]>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 29, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

 - WordPress/wordpress-develop#5907
   Adds support for the INPUT element in the HTML Processor

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <[email protected]>
Co-authored-by: Jon Surrell <[email protected]>
@dmsnell dmsnell deleted the html-api/scan-all-tokens branch February 1, 2024 00:15
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Feb 2, 2024
When parser states were introduced in WordPress#5683,
nothing in the `seek()` method reset the parser state. This is
problematic because it could leave the parser in the wrong state.

In this patch the parser state is reset so that it get's properly
adjusted on the successive call to `next_token()`.

Follows [57348]

Props @kevin940726 for finding and reporting.
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Feb 6, 2024
Updates from WordPress/wordpress-develop:
 - From: WordPress/wordpress-develop@54a09a7
 - To: WordPress/wordpress-develop@7a71339

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5907
   Adds support for the INPUT element in the HTML Processor

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

 - WordPress/wordpress-develop#5976
   Avoids splitting text nodes on "<" character.

 - WordPress/wordpress-develop#5992
   Only recognize true CDATA-lookalike nodes.

 - WordPress/wordpress-develop#5975
   Prevent void tag nesting when calling `next_token()`

 - WordPress/wordpress-develop#6021
   Reset parser state after seeking.

 - https://core.trac.wordpress.org/changeset/57528
   Fix typo in setting token flag.

 - WordPress/wordpress-develop#6041
   Ensure consecutive text is all joined into one text node.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: sergeybiryukov <[email protected]>
Co-authored-by: sirreal <[email protected]>
Co-authored-by: dmsnell <[email protected]>
draganescu pushed a commit to draganescu/wordpress-develop that referenced this pull request Feb 8, 2024
* I18N: Prevent PHP warning in `WP_Textdomain_Registry`.

Prevents a warning upon cache invalidation after language pack updates if the arguments don’t have the expected format.

Follow-up to [57287], [57290], [57298], [57299].

See #58919.

git-svn-id: https://develop.svn.wordpress.org/trunk@57303 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Twenty-Four: Remove extra tab character inside the text domain.

Follow-up to [57281].

Props sabernhardt.
Fixes #60245.

git-svn-id: https://develop.svn.wordpress.org/trunk@57304 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Fix duplicate `determine_locale()` tests added in [57286].

Props johnbillion.
See #58696.

git-svn-id: https://develop.svn.wordpress.org/trunk@57305 602fd350-edb4-49c9-b593-d223f7449a82

* Embeds: Ensure the deprecated function `print_emoji_styles` isn't used

Ensure that the proper new function wp_enqueue_emoji_styles is used in embeds.

Follow-up to: [56194].

Props peterwilsoncc, bobbingwide, hellofromTonya.
Fixes #59892. See: #58775.



git-svn-id: https://develop.svn.wordpress.org/trunk@57306 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Fix unstable query tests.

Three `WP_Query` tests could randomly fail due to an undefined order because two test posts were using the exact same `post_date`.

Props boonebgorges, flixos90.
Fixes #60288.


git-svn-id: https://develop.svn.wordpress.org/trunk@57308 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Fix several typos in inline comments.

Follow-up to [7747], [27419], [55155].

Props shailu25, sabernhardt.
Fixes #60285.

git-svn-id: https://develop.svn.wordpress.org/trunk@57309 602fd350-edb4-49c9-b593-d223f7449a82

* Media: Redirect inactive attachement pages for logged-out users.

Ensure logged out users are redirected to the media file when attachment pages are inactive. This removes the `read_post` capability check from the canonical redirects as anonymous users lack the permission.

Follow-up to [56657], [56658], [56711].

Props afercia, aristath, chesio, joppuyo, jorbin, lakshmananphp, poena, sergeybiryukov.
Fixes #59866.
See #57913.



git-svn-id: https://develop.svn.wordpress.org/trunk@57310 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Twenty: Move the Inter font declaration to a separate file and enqueue the file.

This allows the font to be dequeued by a child theme or plugin.

Props poena, markhowellsmead, nielslange, Otto42, SGr33n, mukesh27, joemcgill.
Fixes #48630.

git-svn-id: https://develop.svn.wordpress.org/trunk@57311 602fd350-edb4-49c9-b593-d223f7449a82

* Bootstrap/Load: Introduce functions to check whether WordPress is serving a REST API request.

This changeset introduces two functions:
* `wp_is_serving_rest_request()` returns a boolean for whether WordPress is serving an actual REST API request.
* `wp_is_rest_endpoint()` returns a boolean for whether a WordPress REST API endpoint is currently being used. While this is always the case if `wp_is_serving_rest_request()` returns `true`, the function additionally covers the scenario of internal REST API requests, i.e. where WordPress calls a REST API endpoint within the same request.

Both functions should only be used after the `parse_request` action.

All relevant manual checks have been adjusted to use one of the new functions, depending on the use-case. They were all using the same constant check so far, while in fact some of them were intending to check for an actual REST API request while others were intending to check for REST endpoint usage.

A new filter `wp_is_rest_endpoint` can be used to alter the return value of the `wp_is_rest_endpoint()` function.

Props lots.0.logs, TimothyBlynJacobs, flixos90, joehoyle, peterwilsoncc, swissspidy, SergeyBiryukov, pento, mikejolley, iandunn, hellofromTonya, Cybr, petitphp.
Fixes #42061.


git-svn-id: https://develop.svn.wordpress.org/trunk@57312 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Twenty: Add missing comma in `twentytwenty_classic_editor_styles()`.

This resolves a WPCS error:
{{{
There should be a comma after the last array item in a multi-line array.
}}}

Follow-up to [57311].

See #48630.

git-svn-id: https://develop.svn.wordpress.org/trunk@57313 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Add support for HR element.

Adds support for the following HTML elements to the HTML Processor:

 - HR

Previously, this element was not supported and the HTML Processor would bail when encountering
it. Now, with this patch, it will proceed to parse an HTML document when encountering one.

Developed in WordPress/wordpress-develop#5897

Props jonsurrell, dmsnell
Fixes #60283



git-svn-id: https://develop.svn.wordpress.org/trunk@57314 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Support deferred block variation initialization on the server.

When registering blocks on the server using `register_block_type()` or similar functions, a set of block type variations can also be registered. However, in some cases building this variation data during block registration can be an expensive process, which is not needed in most contexts. 

To address this problem, this adds support to the `WP_Block_Type` object for a new property, `variation_callback`, which can be used to register a callback for building variation data only when the block variations data is needed. The `WP_Block_Type::variations` property has been changed to a private property that is now accessed through the magic `__get()` method. The magic getter makes use of a new public method, `WP_Block_Type::get_variations` which will build variations from a registered callback if variations have not already been built.

Props spacedmonkey, thekt12, Mamaduka, gaambo, gziolo, mukesh27, joemcgill.
Fixes #59969.


git-svn-id: https://develop.svn.wordpress.org/trunk@57315 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Add support for BR, EMBED, & other tags.

Adds support for the following HTML elements to the HTML Processor:

 - AREA, BR, EMBED, KEYGEN, WBR
 - Only the opening BR tag is supported, as the invalid closer `</br>`
   involves more complicated rules, to be implemented later.

Previously, these elements were not supported and the HTML Processor
would bail when encountering them. With this patch it will proceed to
parse an HTML document when encountering those tags as long as other
normal conditions don't cause it to bail (such as complicated format
reconstruction rules).

Props jonsurrell, dmsnell
Fixes #60283



git-svn-id: https://develop.svn.wordpress.org/trunk@57316 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Add support for PRE and LISTING elements.

Adds support for the following HTML elements to the HTML Processor:

 - PRE, LISTING

Previously, these elements were not supported and the HTML Processor would bail when encountering them. Now, with this patch applied, it will proceed to parse an HTML document when encountering those tags.

Developed in WordPress/wordpress-develop#5903

Props jonsurrell, dmsnell
Fixes #60283



git-svn-id: https://develop.svn.wordpress.org/trunk@57317 602fd350-edb4-49c9-b593-d223f7449a82

* Media: Revert [57310].

This commit reintroduced a minor data exposure issue.

Props swissspidy.
See #59866, #57913.



git-svn-id: https://develop.svn.wordpress.org/trunk@57318 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Cleanup tests and list of void elements.

This patch adds newly supported elements to tests that should have been updated
in recent PRs, but which were merged without that. Those PRs removed failing
tests showing that the elements were unsupported, but did not add the elements
to the list of supported ones.

It also removes some elements from the special-exclusion list of unsupported IN
BODY elements. These did not present in failing tests because earlier
conditions in the switch structure caught the tags before hitting the default
block.

Finally it adds some missing elements to the list of void elements. These
elements are not listed as void in the HTML specification because they are
deprecated. However, they are treated as void for the sake of HTML
serialization and the parsing rules indicate that they behave as void elements,
so it's safe to list them within the HTML API as void.

Developed in WordPress/wordpress-develop#5913

Fixes #60307



git-svn-id: https://develop.svn.wordpress.org/trunk@57319 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Correct the placement of `@global` tags in `wp-settings.php`.

Props shailu25, mukesh27.
Fixes #60146.

git-svn-id: https://develop.svn.wordpress.org/trunk@57320 602fd350-edb4-49c9-b593-d223f7449a82

* Plugins: Correct table layout on smaller screens.

This ensures that the message about deleting a plugin or having no plugins installed is displayed in full width.

Follow-up to [26134], [33016].

Props shailu25, mukesh27, passoniate, JavierCasares, sabernhardt.
Fixes #50069.

git-svn-id: https://develop.svn.wordpress.org/trunk@57321 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Expand "imagemin" Grunt task to cover default themes.

Runs `npm run grunt precommit:image` to minify/compress images in the repository.

Props desrosj.
Fixes #58996.

git-svn-id: https://develop.svn.wordpress.org/trunk@57322 602fd350-edb4-49c9-b593-d223f7449a82

* Bundled Theme: Fix a couple of incorrect theme name references.

Corrects the theme name used in docblocks in two places in Twenty Nineteen and Twenty Seventeen.

Props shailu25, mukesh27.
Fixes #60310.


git-svn-id: https://develop.svn.wordpress.org/trunk@57323 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Twenty-Four: Update license information in readme.

Adds missing license information for bundled fonts.

Props acosmin, shailu25, poena, sabernhardt.
Fixes #59838

git-svn-id: https://develop.svn.wordpress.org/trunk@57324 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Correct the `WP_User Query` location reference in query cache tests.

Follow-up to [1047/tests], [33749], [55657].

See #59651.

git-svn-id: https://develop.svn.wordpress.org/trunk@57325 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Support PARAM, SOURCE, and TRACK tags.

Adds support for the following HTML elements to the HTML Processor:

 - PARAM, SOURCE, TRACK

Previously these elements were not supported and the HTML Processor would bail when encountering them. Now, with this patch applied, it will proceed to parse an HTML document when encountering those tags.

Props jonsurrell, dmsnell
Fixes #60283



git-svn-id: https://develop.svn.wordpress.org/trunk@57326 602fd350-edb4-49c9-b593-d223f7449a82

* Script Modules API: Rename `wp_module` to `wp_script_module`

Renames all mentions to "module" with "script module", including function names, comments, and tests.

Follow up to [57269]

The list of functions renamed are:

 - `wp_module()`          -> `wp_script_module()`.
 - `wp_register_module()` -> `wp_register_script_module()`.
 - `wp_enqueue_module()`  -> `wp_enqueue_script_module()`.
 - `wp_dequeue_module()`  -> `wp_dequeue_script_module()`.
 - `WP_Script_Modules::print_enqueued_modules()` -> `WP_Script_Modules::print_enqueued_script_modules()`.
 - `WP_Script_Modules::print_module_preloads()`  -> `WP_Script_Modules::print_script_module_preloads()`.

It also adds PHP 7 typing to all the functions and improves the types of the `$deps` argument of `wp_register_script_module()` and `wp_enqueue_script_module()` using `@type`.

Props luisherranz, idad5, costdev, nefff, joemcgill, jorbin, swisspidy, jonsurrel, flixos90, gziolo, westonruter, bernhard-reiter, kamranzafar4343
See #56313



git-svn-id: https://develop.svn.wordpress.org/trunk@57327 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: fix classname output on blocks without layout.

Prevents layout classnames from being output on blocks with no layout support and no child layout classnames by returning early from `wp_render_layout_support_flag`.

Props andrewserong.
Fixes #60292.


git-svn-id: https://develop.svn.wordpress.org/trunk@57328 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: fix fluid font division by zero error when min and max viewport widths are equal.

Fixes a division error by returning null when `minViewportWidth` - `maxViewportWidth` is zero in `wp_get_computed_fluid_typography_value`.

Props ramonopoly, mukesh27, andrewserong, audrasjb.
Fixes #60263.


git-svn-id: https://develop.svn.wordpress.org/trunk@57329 602fd350-edb4-49c9-b593-d223f7449a82

* Build Tools: Configure prettier properly.

Allows tools like prettier or VSCode to auto-format JS files propertly.
It pulls the prettier config that is used in the Gutenberg repository.

Props gziolo.
Fixes #60316.

git-svn-id: https://develop.svn.wordpress.org/trunk@57330 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Set show_tagcloud to false for Pattern Categories.

Pattern Categories is a taxonomy used to categories the patterns in the site editor.
It is not meant to be shown in the frontend and show tag clouds.

Props wildworks, mukesh27.
Fixes #60119.

git-svn-id: https://develop.svn.wordpress.org/trunk@57331 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Ensure PHPUnit10 compatibility for ThemeJson unit test.

Expecting E_STRICT, E_NOTICE, and E_USER_NOTICE errors is deprecated in PHPUnit 10.
This updates the test to rely on an exception instead.

Props antonvlasenko.
Fixes #60305.

git-svn-id: https://develop.svn.wordpress.org/trunk@57332 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Update the ThemeJson unit test to cover custom CSS feature.

In #59499 a fix have been shipped to theme.json custom CSS
when applied to blocks with multiple CSS selectors.
This commit covers that fix with a unit test.

Props wildworks.
Fixes #60294.

git-svn-id: https://develop.svn.wordpress.org/trunk@57333 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Define the labels of the pattern category taxonomy.

In WordPress 6.5, the taxonomy is going to be rendered using a standard UI
in the editor, this means that all the labels need to be defined properly.

Props ntsekouras.
Fixes #60322.

git-svn-id: https://develop.svn.wordpress.org/trunk@57334 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Fix back to items label capitalization for the pattern categories.

This uses the same capitalization used in Tags or Link Categories taxonomies.

Props mukesh27.
See #60322.

git-svn-id: https://develop.svn.wordpress.org/trunk@57335 602fd350-edb4-49c9-b593-d223f7449a82

* General: Add $schema property to block and theme JSON files.

Additionally, this changeset fixes some of the `block.json` and `theme.json` files in PHPUnit tests by adding missing `title` properties to satisfy the schema. Those changes have no impact on the runtime whatsoever and do not change the result of unit tests.

Note that some block and theme JSON files still aren't valid according to the schema. Fixing is underway; the required changes will be merged subsequently.

Props jonsurrell, dmsnell, gziolo.
Fixes #60255.

git-svn-id: https://develop.svn.wordpress.org/trunk@57336 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Introduce a more performant localization library.

This introduces a more lightweight library for loading `.mo` translation files which offers increased speed and lower memory usage.
It also supports loading multiple locales at the same time, which makes locale switching faster too.

For plugins interacting with the `$l10n` global variable in core, a shim is added to retain backward compatibility with the existing `pomo` library.

In addition to that, this library supports translations contained in PHP files, avoiding a binary file format and leveraging OPCache if available.
If an `.mo` translation file has a corresponding `.l10n.php` file, the latter will be loaded instead.
This behavior can be adjusted using the new `translation_file_format` and `load_translation_file` filters.

PHP translation files will be typically created by downloading language packs, but can also be generated by plugins.
See https://make.wordpress.org/core/2023/11/08/merging-performant-translations-into-core/ for more context.

Props dd32, swissspidy, flixos90, joemcgill, westonruter, akirk, SergeyBiryukov.
Fixes #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57337 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Add missing variable in string replacement.

Ensures the preferred file name for lookup has the correct extension.

Follow-up to [57337].
See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57338 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Improve edge case handling in `WP_Translation_Controller`.

Prevents PHP warnings for possibly undefined array keys.
Also fixes incorrect `@covers` annotations.

Follow-up to [57337].
See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57339 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Unset reference used in foreach statement.

In PHP it is a good practice to unset $value if it was created by reference in a foreach loop, as the reference is still valid outside the loop, and this avoids accidental bugs.

Props get_dave.
Fixes #60326.

git-svn-id: https://develop.svn.wordpress.org/trunk@57340 602fd350-edb4-49c9-b593-d223f7449a82

* Script Loader: Only emit CDATA wrapper comments in `wp_get_inline_script_tag()` for JavaScript.

This avoids erroneously adding CDATA wrapper comments for non-JavaScript scripts, including those for JSON such as the `importmap` for script modules in #56313.

Props westonruter, flixos90, mukesh27, dmsnell.
See #56313.
Fixes #60320.


git-svn-id: https://develop.svn.wordpress.org/trunk@57341 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Add missing full stop in `WP_Comment_Query::parse_query()` DocBlock.

Props hardik2221.
Fixes #60323.

git-svn-id: https://develop.svn.wordpress.org/trunk@57342 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Support INPUT tags.

Adds support for the following HTML elements to the HTML Processor:

 - INPUT

Previously this element was not supported and the HTML Processor would bail when encountering one. Now, with this patch applied, it will proceed to parse the HTML document.

Developed in https://github.com/WordPress/wordpress-develop/pull/5907
Discussed in https://core.trac.wordpress.org/ticket/60283

Props jonsurrell
See #60283



git-svn-id: https://develop.svn.wordpress.org/trunk@57343 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Improve docblocks after [57337].

Props mukesh27.
See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57344 602fd350-edb4-49c9-b593-d223f7449a82

* Script Loader: Load the modules to the footer in classic themes

Incremental import maps fail if the import map is printed after the module scripts.
This means, we should always render import maps first. This means that for classic themes, we need to move the import map and modules to the footer because we can't know before that which modules are needed.

Props luisherranz, cbravobernal.
Fixes #60240.

git-svn-id: https://develop.svn.wordpress.org/trunk@57345 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Scan all syntax tokens in a document, read modifiable text.

Since its introduction in WordPress 6.2 the HTML Tag Processor has
provided a way to scan through all of the HTML tags in a document and
then read and modify their attributes. In order to reliably do this, it
also needed to be aware of other kinds of HTML syntax, but it didn't
expose those syntax tokens to consumers of the API.

In this patch the Tag Processor introduces a new scanning method and a
few helper methods to read information about or from each token. Most
significantly, this introduces the ability to read `#text` nodes in the
document.

What's new in the Tag Processor?
================================

 - `next_token()` visits every distinct syntax token in a document.
 - `get_token_type()` indicates what kind of token it is.
 - `get_token_name()` returns something akin to `DOMNode.nodeName`.
 - `get_modifiable_text()` returns the text associated with a token.
 - `get_comment_type()` indicates why a token represents an HTML comment.

Example usage.
==============

{{{
<?php
function strip_all_tags( $html ) {
        $text_content = '';
        $processor    = new WP_HTML_Tag_Processor( $html );

        while ( $processor->next_token() ) {
                if ( '#text' !== $processor->get_token_type() ) {
                        continue;
                }

                $text_content .= $processor->get_modifiable_text();
        }

        return $text_content;
}
}}}

What changes in the Tag Processor?
==================================

Previously, the Tag Processor would scan the opening and closing tag of
every HTML element separately. Now, however, there are special tags
which it only visits once, as if those elements were void tags without
a closer.

These are special tags because their content contains no other HTML or
markup, only non-HTML content.

 - SCRIPT elements contain raw text which is isolated from the rest of
   the HTML document and fed separately into a JavaScript engine. There
   are complicated rules to avoid escaping the script context in the HTML.
   The contents are left verbatim, and character references are not decoded.

 - TEXTARA and TITLE elements contain plain text which is decoded
   before display, e.g. transforming `&amp;` into `&`. Any markup which
   resembles tags is treated as verbatim text and not a tag.

 - IFRAME, NOEMBED, NOFRAMES, STYLE, and XMP elements are similar to the
   textarea and title elements, but no character references are decoded.
   For example, `&amp;` inside a STYLE element is passed to the CSS engine
   as the literal string `&amp;` and _not_ as `&`.

Because it's important not treat this inner content separately from the
elements containing it, the Tag Processor combines them when scanning
into a single match and makes their content available as modifiable
text (see below).

This means that the Tag Processor will no longer visit a closing tag for
any of these elements unless that tag is unexpected.

{{{
    <title>There is only a single token in this line</title>
    <title>There are two tokens in this line></title></title>
    </title><title>There are still two tokens in this line></title>
}}}

What are tokens?
================

The term "token" here is a parsing term, which means a primitive unit in
HTML. There are only a few kinds of tokens in HTML:

 - a tag has a name, attributes, and a closing or self-closing flag.
 - a text node, or `#text` node contains plain text which is displayed
   in a browser and which is decoded before display.
 - a DOCTYPE declaration indicates how to parse the document.
 - a comment is hidden from the display on a page but present in the HTML.

There are a few more kinds of tokens that the HTML Tag Processor will
recognize, some of which don't exist as concepts in HTML. These mostly
comprise XML syntax elements that aren't part of HTML (such as CDATA and
processing instructions) and invalid HTML syntax that transforms into
comments.

What is a funky comment?
========================

This patch treats a specific kind of invalid comment in a special way.
A closing tag with an invalid name is considered a "funky comment." In
the browser these become HTML comments just like any other, but their
syntax is convenient for representing a variety of bits of information
in a well-defined way and which cannot be nested or recursive, given
the parsing rules handling this invalid syntax.

 - `</1>`
 - `</%avatar_url>`
 - `</{"wp_bit": {"type": "post-author"}}>`
 - `</[post-author]>`
 - `</__( 'Save Post' );>`

All of these examples become HTML comments in the browser. The content
inside the funky content is easily parsable, whereby the only rule is
that it starts at the `<` and continues until the nearest `>`. There
can be no funky comment inside another, because that would imply having
a `>` inside of one, which would actually terminate the first one.

What is modifiable text?
========================

Modifiable text is similar to the `innerText` property of a DOM node.
It represents the span of text for a given token which may be modified
without changing the structure of the HTML document or the token.

There is currently no mechanism to change the modifiable text, but this
is planned to arrive in a later patch.

Tags
====

Most tags have no modifiable text because they have child nodes where
text nodes are found. Only the special tags mentioned above have
modifiable text.

{{{
    <div class="post">Another day in HTML</div>
    └─ tag ──────────┘└─ text node ─────┘└────┴─ tag
}}}

{{{
    <title>Is <img> &gt; <image>?</title>
    │      └ modifiable text ───┘       │ "Is <img> > <image>?"
    └─ tag ─────────────────────────────┘
}}}

Text nodes
==========

Text nodes are entirely modifiable text.

{{{
    This HTML document has no tags.
    └─ modifiable text ───────────┘
}}}

Comments
========

The modifiable text inside a comment is the portion of the comment that
doesn't form its syntax. This applies for a number of invalid comments.

{{{
    <!-- this is inside a comment -->
    │   └─ modifiable text ──────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <!-->
    This invalid comment has no modifiable text.
}}}

{{{
    <? this is an invalid comment -->
    │ └─ modifiable text ────────┘  │
    └─ comment token ───────────────┘
}}}

{{{
    <[CDATA[this is an invalid comment]]>
    │       └─ modifiable text ───────┘ │
    └─ comment token ───────────────────┘
}}}

Other token types also have modifiable text. Consult the code or tests
for further information.

Developed in https://github.com/WordPress/wordpress-develop/pull/5683
Discussed in https://core.trac.wordpress.org/ticket/60170

Follows [57575]

Props bernhard-reiter, dlh, dmsnell, jonsurrell, zieladam
Fixes #60170



git-svn-id: https://develop.svn.wordpress.org/trunk@57348 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Fix typo in `_get_block_template_file()` DocBlock.

Follow-up to [55744].

See #59651.

git-svn-id: https://develop.svn.wordpress.org/trunk@57349 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Rename `WP_Translation_Controller::instance()` method to `get_instance()`.

This improves consistency as `get_instance()` is more commonly used in core. 

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57350 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Twenty-Four: Change font family slug to lowercase.

Ensures referencing the correct CSS custom property.

Props RavanH, poena, onemaggie, huzaifaalmesbah, mukesh27.
Fixes #60325.

git-svn-id: https://develop.svn.wordpress.org/trunk@57351 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Fix Theme.json application of custom root selector for styles.

Theme.json stylesheets attempting to use a custom root selector are generated with in correct styles.

Props aaronrobertshaw, get_dave, mukesh27.
Fixes #60343.

git-svn-id: https://develop.svn.wordpress.org/trunk@57352 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add video and audio pattern categories.

More categories, better organization for patterns
as they grow and power more WordPress websites.

Props aaronrobertshaw, get_dave.
Fixes #60342.

git-svn-id: https://develop.svn.wordpress.org/trunk@57353 602fd350-edb4-49c9-b593-d223f7449a82

* Block Hooks: Introduce a new `hooked_block_{$block_type}` filter.

Add a new `hooked_block_{$block_type}` filter that allows modifying a hooked block (in parsed block format) prior to insertion, while providing read access to its anchor block (in the same format).

This allows block authors to e.g. set a hooked block's attributes, or its inner blocks; the filter can peruse information about the anchor block when doing so. As such, this filter provides a solution to both #59572 and #60126.

The new filter is designed to strike a good balance and separation of concerns with regard to the existing [https://developer.wordpress.org/reference/hooks/hooked_block_types/ `hooked_block_types` filter], which allows addition or removal of a block to the list of hooked blocks for a given anchor block -- all of which are identified only by their block ''types''. This new filter, on the other hand, only applies to ''one'' hooked block at a time, and allows modifying the entire (parsed) hooked block; it also gives (read) access to the parsed anchor block.

Props gziolo, tomjcafferkey, andrewserong, isabel_brison, timbroddin, yansern.
Fixes #59572, #60126.

git-svn-id: https://develop.svn.wordpress.org/trunk@57354 602fd350-edb4-49c9-b593-d223f7449a82

* Block Hooks: Amend PHPDoc for `hooked_block_{$hooked_block_type}` filter.

Add missing explanation of the dynamic part of the hook name.

Follow-up [57354].
Props swissspidy.
See #59572, #60126.

git-svn-id: https://develop.svn.wordpress.org/trunk@57355 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Fix a few typos in `wp-includes/pomo/po.php`.

Props shailu25.
Fixes #60346.

git-svn-id: https://develop.svn.wordpress.org/trunk@57356 602fd350-edb4-49c9-b593-d223f7449a82

* Media: Redirect inactive attachment pages for logged-out users.

Ensure logged out users are redirected to the media file when attachment pages are inactive. This removes the read_post capability check from the canonical redirects as anonymous users lack the permission.

This was previously committed in [57310] before being reverted in [57318]. This update includes a fix to cover instances where revealing a URL could be considered a data leak and greatly expands the unit tests to ensure that this is covered along with many other instances.

Follow-up to [56657], [56658], [56711], [57310], [57318].

Props peterwilsoncc, jorbin, afercia, aristath, chesio, joppuyo, jorbin, lakshmananphp, poena, sergeybiryukov, swissspidy, johnbillion.
Fixes #59866.
See #57913.


git-svn-id: https://develop.svn.wordpress.org/trunk@57357 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Tests: Ensure set_error_handler is cleaned up.

Follow up to: [57332].

Fixes #60305.



git-svn-id: https://develop.svn.wordpress.org/trunk@57361 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Update third-party GitHub Actions.

This updates the following third-party GitHub Actions to their latest versions:

- `actions/setup-node` from `3.8.1` to `4.0.1`
- `actions/upload-artifact` from `3.1.2` to `4.3.0`
- `shivammathur/setup-php` from `2.28.0` to `2.29.0`
- `actions/cache` from `3.3.2` to `4.0.0`
- `codecov/codecov-action` from `3.1.4` to `3.1.5`

Most notably, these updates silence newly encountered notices as a result of GitHub beginning to transition away from Node.js 16 to Node.js 20 (see https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/).

Props swissspidy.
See #59805.

git-svn-id: https://develop.svn.wordpress.org/trunk@57362 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Update the `caniuse` data.

This updates the `caniuse-lite` database and includes all resulting CSS and built file changes, which are all minor changes due to fluctuations in browser usage.

Props gziolo, jonsurrell.
See #59657.

git-svn-id: https://develop.svn.wordpress.org/trunk@57363 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Add missing escaping in `Custom_Image_Header::step_2()`.

Follow-up to [4673], [14907].

Props nareshbheda, audrasjb, kebbet.
Fixes #59278.

git-svn-id: https://develop.svn.wordpress.org/trunk@57364 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Fix some spaces on block-supports background.

When we run composer format these changes are applied so I guess we should just commit them to avoid seeing the changes again the future.

git-svn-id: https://develop.svn.wordpress.org/trunk@57365 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add original_source and author_text to the templates REST API.

For the new "All templates" UI to work properly we need the REST API to provide to additional fields original_source, and author_text.

Props ntsekouras, get_dave.
Fixes #60358.

git-svn-id: https://develop.svn.wordpress.org/trunk@57366 602fd350-edb4-49c9-b593-d223f7449a82

* Script Loader: Clarify in docs that `wp_get_inline_script_tag()` and `wp_print_inline_script_tag()` can take non-JS data.

Props vladimiraus.
Fixes #60331.


git-svn-id: https://develop.svn.wordpress.org/trunk@57367 602fd350-edb4-49c9-b593-d223f7449a82

* Tests: Expand `sanitize_text_field()` tests.

This change ensures that the `sanitize_text_field` and `sanitize_textarea_field` filters are correctly invoked for the respective functions.

Follow-up to [38944].

Props pbearne, audrasjb.
Fixes #60357.

git-svn-id: https://develop.svn.wordpress.org/trunk@57368 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Add missing escaping functions to `WP_Customize_Control` and `WP_Customize_Nav_Menu_Location_Control`.

Follow-up to [20295], [32806].

Props nareshbheda, shailu25, sabernhardt, audrasjb.
Fixes #60324.





git-svn-id: https://develop.svn.wordpress.org/trunk@57369 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Improve various globals documentation, as per docblock standards.

Props upadalavipul, audrasjb, shailu25, viralsampat.
Fixes #59255.
See #59651.





git-svn-id: https://develop.svn.wordpress.org/trunk@57370 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Typo correction in `wp_internal_hosts` docblock.

Follow-up to [55289].

Props shailu25.
Fixes #60363.





git-svn-id: https://develop.svn.wordpress.org/trunk@57371 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Use strict type check for `in_array()` in `get_hooked_block_markup()`.

This aims to prevent type juggling causing incorrect results.

Follow-up to [57157].

Props jrf.
See #60279.

git-svn-id: https://develop.svn.wordpress.org/trunk@57372 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add registry for block binding sources

It is part of the sync from the Gutenberg plugin that introduces the registry for block binding sources required for the new Block Bindings API: https://github.com/WordPress/gutenberg/issues/54536.

See #60282.
Props czapla, artemiosans, santosguillamot, sc0ttkclark, lgladdy, talldanwp, swissspidy, youknowriad, fabiankaegy.



git-svn-id: https://develop.svn.wordpress.org/trunk@57373 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Remove unnecessary access and internal annotations from two functions in WP_REST_Templates_Controller.

This commit removes unnecessary access and internal annotations from two functions that are private and as such don't require the annotation. It also adds the since annotation with the 6.5 release given that the annotation may be useful.

Props swissspidy.
See #60358.

git-svn-id: https://develop.svn.wordpress.org/trunk@57374 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add Block Bindings API helpers

It is part of the sync from the Gutenberg plugin that introduces the registry for block binding sources required for the new Block Bindings API: WordPress/gutenberg#54536.

See #60282.
Follow-up [57373].
Props czapla, artemiosans, santosguillamot, sc0ttkclark, lgladdy, talldanwp, swissspidy, youknowriad, fabiankaegy, mukesh27.



git-svn-id: https://develop.svn.wordpress.org/trunk@57375 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Update third-party Slack action.

This updates the `slackapi/slack-github-action` from `1.24.0` to `1.25.0`. This fixes more GitHub Action deprecated notices.

Follow up to [57362].

See #59805.

git-svn-id: https://develop.svn.wordpress.org/trunk@57376 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Update the WordPress packages to the Gutenberg 16.7 RC2 version.

This patch, somewhat small brings a lot to WordPress. 
This includes features like:

 - DataViews.
 - Customization tools like box shadow, background size and repeat.
 - UI improvements in the site editor. 
 - Preferences sharing between the post and site editors.
 - Unified panels and editors between post and site editors.
 - Improved template mode in the post editor.
 - Iterations to multiple interactive blocks.
 - Preparing the blocks and UI for pattern overrides.
 - and a lot more.

Props luisherranz, gziolo, isabel_brison, costdev, jonsurrell, peterwilsoncc, get_dave, antonvlasenko, desrosj.
See #60315.

git-svn-id: https://develop.svn.wordpress.org/trunk@57377 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Update PHPCS to version 3.8.1.

PHPCS has seen two new releases since the update to WPCS 3.0, with especially the 3.8.0 version containing a huge number of improvements.

References:
* [https://github.com/PHPCSStandards/PHP_CodeSniffer/releases/tag/3.8.0 PHP_CodeSniffer 3.8.0 release notes]
* [https://github.com/PHPCSStandards/PHP_CodeSniffer/releases/tag/3.8.1 PHP_CodeSniffer 3.8.1 release notes]

Follow-up to [56695].

Props jrf, swissspidy.
Fixes #60279.

git-svn-id: https://develop.svn.wordpress.org/trunk@57378 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Test against MySQL 8.3

Version 8.3 is the latest short-term innovation release of MySQL.

See #59779.

git-svn-id: https://develop.svn.wordpress.org/trunk@57379 602fd350-edb4-49c9-b593-d223f7449a82

* REST API: Support assigning terms when creating attachments.

Props mukesh27, Dharm1025, Ankit K Gupta, swissspidy, dharm1025, tanjimtc71, timothyblynjacobs, spacedmonkey.
Fixes #57897.

git-svn-id: https://develop.svn.wordpress.org/trunk@57380 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Ensure `.l10n.php` files are deleted when upgrading language packs.

Props amieiro.
See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57381 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Delete `.l10n.php` files when deleting a theme.

Follow-up to [57337] where this was already added for plugins.

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57382 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Fix PHP warning in Layout block support.

strpos was triggering a php warning. 
This also updates the code to use the now supported str_contains.

Props get_dave, dmsnell, ocean90, mukesh27.
Fixes #60327.

git-svn-id: https://develop.svn.wordpress.org/trunk@57383 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Update the minimum compatible version of Gutenberg.

Previous Gutenberg versions are not compatible with recent trunk because of the
WP_Navigation_Block_Renderer classname. It's present in both.

Gutenberg has been updated to avoid the use of this class but we need to auto-disable 
old plugins to avoid fatals.

Props hellofromtonya.
See #60315.

git-svn-id: https://develop.svn.wordpress.org/trunk@57384 602fd350-edb4-49c9-b593-d223f7449a82

* Tests: Remove redundant unregister call in block bindings tear down

Only block bindings sources registered in the tests should get unregistered.

Follow-up for [57375].
See #60282.
Props czapla.



git-svn-id: https://develop.svn.wordpress.org/trunk@57385 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Improve singular lookup of pluralized strings.

Ensures that looking up a singular that is also used as a pluralized string works as expected.
This improves compatibility for cases where for example both `__( 'Product' )` and `_n( 'Product', 'Products’, num )` are used in a project, where both will use the same translation for the singular version.

Although such usage is not really recommended nor documented, it must continue to work in the new i18n library in order to maintain backward compatibility and maintain expected behavior.

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57386 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Add missing space after `foreach` keyword.

Follow-up to [57386].

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57387 602fd350-edb4-49c9-b593-d223f7449a82

* Uploads: Check for and verify ZIP archives.

Props costdev, peterwilsoncc, azaozz, tykoted, johnbillion, desrosj, afragen, jorbin.


git-svn-id: https://develop.svn.wordpress.org/trunk@57388 602fd350-edb4-49c9-b593-d223f7449a82

* Install: When populating options, maybe_serialize instead of always serialize.

Props xknown, peterwilsoncc, jorbin, desrosj.


git-svn-id: https://develop.svn.wordpress.org/trunk@57389 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Fix splitting single text node.

When `next_token()` was introduced, it brought a subtle bug. When encountering a `<` in the HTML stream which did not lead to a tag or comment or other token, it was treating the full text span to that point as one text node, and the following span another text node.

The entire span should be one text node.

In this patch the Tag Processor properly detects this scenario and combines the spans into one text node.

Follow-up to [57348]

Props jonsurrell
Fixes #60385



git-svn-id: https://develop.svn.wordpress.org/trunk@57489 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: reduce specificity of block style variation selector.

Removes duplicate classname from the block style variation selector generated in `WP_Theme_JSON`’s `get_blocks_metadata` function.

Props flixos90, joemcgill, mukesh27, isabel_brison.
Fixes #60312.


git-svn-id: https://develop.svn.wordpress.org/trunk@57490 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: introduce `dimensions.aspectRatio` block support.

Adds front end rendering logic for the `dimensions.aspectRatio` block support as well as the required logic in `WP_Theme_JSON` and the style engine.

Props andrewserong.
Fixes #60365.


git-svn-id: https://develop.svn.wordpress.org/trunk@57491 602fd350-edb4-49c9-b593-d223f7449a82

* Script Modules API: Add import map polyfill for older browsers

Syncs the changes from https://github.com/WordPress/gutenberg/pull/58263. Adds a polyfill to make import maps compatible with unsported browsers (https://caniuse.com/import-maps).

Fixes #60348.
Props cbravobernal, jorbin, luisherranz, jonsurrell.



git-svn-id: https://develop.svn.wordpress.org/trunk@57492 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add `viewStyle` property to `block.json` for frontend-only block styles

Related issue in Gutenberg: https://github.com/WordPress/gutenberg/issues/54491.

For block scripts there was already `script`, `viewScript` and `editorScript`. For block styles there was only `style` and `editorStyle`. This brings the parity.

Props gaambo.
Fixes #59673. 



git-svn-id: https://develop.svn.wordpress.org/trunk@57493 602fd350-edb4-49c9-b593-d223f7449a82

* REST API: Add route for single styles revisions.

Adds a route for single global styles revisions: /wp/v2/global-styles/${ parentId }/revisions/${ revisionsId }
This fixes the `getRevision` actions in the core-data package.

Props ramonopoly, get_dave.
Fixes #59810.

git-svn-id: https://develop.svn.wordpress.org/trunk@57494 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Eleven: Fix typo in `twentyeleven_widgets_init()` description.

Follow-up to [17738].

Props harshgajipara.
See #60383.

git-svn-id: https://develop.svn.wordpress.org/trunk@57495 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Sanitize nested array in theme.json properly.

WP_Theme_JSON sanitization is now able to sanitize data contained on indexed arrays. 
So certain data from theme.json, for example, settings.typography.fontFamilies which is a JSON array will be sanitized.

Props mmaattiiaass, mukesh27.
Fixes #60360.

git-svn-id: https://develop.svn.wordpress.org/trunk@57496 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Fix Theme.json font settings in unit test.

These changes fix incorrect font settings when testing the generation of a theme.json stylesheet.

Props aaronrobertshaw, mukesh27.
Fixes #60341.

git-svn-id: https://develop.svn.wordpress.org/trunk@57497 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Fix Theme.json font settings unit test.

This file has been ommitted from the previous commit [57497].

See #60341.

git-svn-id: https://develop.svn.wordpress.org/trunk@57498 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Update WordPress packages to Gutenberg 16.7 RC3.

It brings with a set of iterations and follow-ups to the initial package update.
It also fixes a regression that happened for interactive blocks.

Props gziolo, luisherranz, cbravobernal.
See #60315.

git-svn-id: https://develop.svn.wordpress.org/trunk@57499 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: fix small typos in block bindings API docblocks.

Props shailu25.
See #60282.
Fixes #60386.

git-svn-id: https://develop.svn.wordpress.org/trunk@57500 602fd350-edb4-49c9-b593-d223f7449a82

* HTTP API: Ensure cookie names are cast to strings.

Props nosilver4u, darssen, kraftbj, engahmeds3ed, barry.hughes, schlessera.
Fixes #58566.

git-svn-id: https://develop.svn.wordpress.org/trunk@57501 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Twenty-Three: Rename Comments template part.

This renames the Comments template part to 'Comments Template Part', to reduce confusion with the 'Comments' block when viewing both in the inserter.

Props mikachan, mukesh27, poena.
Fixes #56999.

git-svn-id: https://develop.svn.wordpress.org/trunk@57502 602fd350-edb4-49c9-b593-d223f7449a82

* Script Loader: Use a global variable in `wp_script_modules()`.

This brings the function more in line with its related `wp_scripts()` and `wp_styles()` functions and makes it easier to reset the class instance in tests.

Props westonruter, luisherranz.
See #56313.

git-svn-id: https://develop.svn.wordpress.org/trunk@57503 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Load new translation library in `wp_load_translations_early()`.

Ensures localization continues to work as expected with the new library in case
translations need to be loaded early in the process.

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57504 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Revert [57386] pending further investigation.

Reverts the change for fallback string lookup due to a performance regression in the bad case scenario.

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57505 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Fix CDATA lookalike matching invalid CDATA

When `next_token()` was introduced to the HTML Tag Processor, it started
classifying comments that look like they were intended to be CDATA sections.
In one of the changes made during development, however, a typo slipped
through code review that treated comments as CDATA even if they only
ended in `]>` and not the required `]]>`.

The consequences of this defect were minor because in all cases these are
treated as HTML comments from invalid syntax, but this patch adds the
missing check to ensure the proper reporting of CDATA-lookalikes.

Follow-up to [57348]

Props jonsurrell
Fixes #60406



git-svn-id: https://develop.svn.wordpress.org/trunk@57506 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Fix void tag nesting with next_token

When `next_token()` was introduced, it introduced a regression in the HTML
Processor whereby void tags remain on the stack of open elements when they
shouldn't. This led to invalid values returned from `get_breadcrumbs()`.

The reason was that calling `next_token()` works through a different code path
than the HTML Processor runs everything else. To solve this, its sub-classed
`next_token()` called `step( self::REPROCESS_CURRENT_TOKEN )` so that the proper
HTML accounting takes place.

Unfortunately that same reprocessing code path skipped the step whereby void
and self-closing elements are popped from the stack of open elements.

In this patch, that step is run with a third mode for `step()`, which is the
new `self::PROCESS_CURRENT_TOKEN`. This mode acts as if `self::PROCESS_NEXT_NODE`
were called, except it doesn't advance the parser.

Developed in https://github.com/WordPress/wordpress-develop/pull/5975
Discussed in https://core.trac.wordpress.org/ticket/60382

Follow-up to [57348]

Props dmsnell, jonsurrell
Fixes #60382



git-svn-id: https://develop.svn.wordpress.org/trunk@57507 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Test cleanup

Rename `$p` variable to `$processor` in tests for clarity.

Use static data providers. A mix of static and non-static data providers were
used in HTML API tests.  Data providers are required to be static in the next
PHPUnit version and there's no harm in using them consistently now.

Follow-up to [57507]

Props jonsurrell
See #59647



git-svn-id: https://develop.svn.wordpress.org/trunk@57508 602fd350-edb4-49c9-b593-d223f7449a82

* Docs: Fix typo in `do_robots()` docblock.

This was introduced in [45928].

Props shailu25, mukesh27.
Fixes #60405.

git-svn-id: https://develop.svn.wordpress.org/trunk@57509 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Remove shadow support via direct attribute.

Shadow block support should always rely on the style attribute instead.

Props madhudollu.
Fixes #60377.

git-svn-id: https://develop.svn.wordpress.org/trunk@57510 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add deprecated functions from interactivity core blocks.

In 6.5 we are removing a couple of functions in Core blocks that were enqueuing the files needed to add that interactivity. Interactivity is handled with modules, so those functions are not needed anymore and are deprecated.

Props swissspidy, cbravobernal.
Fixes #60380.

git-svn-id: https://develop.svn.wordpress.org/trunk@57511 602fd350-edb4-49c9-b593-d223f7449a82

* Twenty Fifteen: Fix typo in `css/blocks.css`.

Follow-up to [43798].

Props shailu25, harshgajipara.
Fixes #60383.

git-svn-id: https://develop.svn.wordpress.org/trunk@57512 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Improve singular lookup of pluralized strings.

Ensures that string lookup in MO files only uses the singular string.

This matches expected behavior with gettext files and improves compatibility for cases where for example both `__( 'Product' )` and `_n( 'Product', 'Products’, num )` are used in a project, where both will use the same translation for the singular version. Maintains backward compatibility and feature parity with the pomo library and the PHP translation file format.

Replaces [57386], which was reverted in [57505], with a more accurate and performant solution.

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57513 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add the Block Bindings API.

This introduces the Block Bindings API for WordPress.

The API allows developers to connects block attributes to different sources. In this PR, two such sources are included: "post meta" and "pattern". Attributes connected to sources can have their HTML replaced by values coming from the source in a way defined by the binding.

Props czapla, lgladdy, gziolo, sc0ttkclark, swissspidy, artemiosans, kevin940726, fabiankaegy, santosguillamot, talldanwp, wildworks.
Fixes #60282.

git-svn-id: https://develop.svn.wordpress.org/trunk@57514 602fd350-edb4-49c9-b593-d223f7449a82

* Media: Prevent local edits during media upload.

Prevent `options.allowLocalEdits` from toggling to true during the upload cycle. Otherwise, media meta fields can be edited, but the data will be lost as soon as the upload process is completed.

Props codepo8, oglekler, nicolefurlan, antpb, syamraj24, joedolson.
Fixes #58783, #23374.

git-svn-id: https://develop.svn.wordpress.org/trunk@57515 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Support loading `.l10n.php` translation files on their own.

Adjusts the translation file lookup in `WP_Textdomain_Registry` so that just-in-time translation loading
works even if there is only a `.l10n.php` translation file without a corresponding `.mo` file.

While language packs continue to contain both file types, this makes it easier to use translations in a project
without having to deal with `.mo` or `.po` files.

Props Chrystl.
See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57516 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Introduce Props Bot workflow.

Props Bot is a new GitHub Action that will compile a list of contributors for a given pull request. The bot will leave a comment with a list of contributors formatted for use in both Trac SVN and GitHub.

Props dharm1025, desrosj, jorbin, jeffpaul, dd32, pento, gziolo, swissspidy, talldanwp, noisysocks, youknowriad, peterwilsoncc, joemcgill, chrisdavidmiles, wpscholar, annezazu, chanthaboune, desrosjbot.
See #60417.

git-svn-id: https://develop.svn.wordpress.org/trunk@57517 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Fix plural forms parsing in `WP_Translation_File`.

Ensures the plural expression from the translation file header is correctly parsed.
Prevents silent failures in the attempt to create the plural form function.

Adds additional tests.

Props Chouby.
See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57518 602fd350-edb4-49c9-b593-d223f7449a82

* I18N: Add type declaration to new method missed in [57518].

See #59656.

git-svn-id: https://develop.svn.wordpress.org/trunk@57519 602fd350-edb4-49c9-b593-d223f7449a82

* Administration: Accessibility: Use the default cursor style for labels and disabled form controls.

The native cursor style for labels and form controls is `default`, which is the platform-dependent default cursor. Typically an arrow. Historically, WordPress always used the `pointer` style for all form controls and labels. While this isn't standard, there is some value in using the `pointer` style for form controls. However, labels should use the default style especially when the associated controls are disabled.
Additionally, makes sure the disabled styling works for form controls with an `aria-disabled="true"` attribute.

Props joedolson, afercia.
Fixes #59733.


git-svn-id: https://develop.svn.wordpress.org/trunk@57520 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Add `allowed_blocks` field to block registration and REST API

There is a new block.json field called allowedBlocks, added in Gutenberg in https://github.com/WordPress/gutenberg/pull/58262. This adds support for this new field also on the server. 

Props: gziolo, jsnajdr.
Fixes #60403.




git-svn-id: https://develop.svn.wordpress.org/trunk@57521 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Use strict comparison for functions lookup in plugin/theme editors.

Follow-up to [10607], [44617].

Props upadalavipul.
See #60415.

git-svn-id: https://develop.svn.wordpress.org/trunk@57522 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Some improvements to the Props Bot workflow.

This makes a few improvements made to the Props Bot workflow:

- The bot will no longer run on draft PRs.
- The bot will no longer run on closed PRs.
- The bot will no longer run when a comment is deleted (this should almost never happen).

Props mamaduka, gziolo.
See #60417.

git-svn-id: https://develop.svn.wordpress.org/trunk@57523 602fd350-edb4-49c9-b593-d223f7449a82

* Media: enable AVIF support.

Add support for uploading, editing and saving AVIF images when supported by the server.

Add 'image/avif' to supported mime types. Correctly identify AVIF images and sizes even when PHP doesn't support AVIF. Resize uploaded AVIF files (when supported) and use for front end markup.

Props adamsilverstein, lukefiretoss, ayeshrajans, navjotjsingh, Tyrannous, jb510, gregbenz, nickpagz, JavierCasares, mukesh27, yguyon, swissspidy.
Fixes #51228.



git-svn-id: https://develop.svn.wordpress.org/trunk@57524 602fd350-edb4-49c9-b593-d223f7449a82

* Media: fix AVIF tests.

Follow up to r57524. Properly add AVIF images for unit tests.

Fixes #51228.



git-svn-id: https://develop.svn.wordpress.org/trunk@57525 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Refactor the way block bindings sources are handled

It fixes the coding style issues reported. It goes further and improves the code quality it other places where the logic for block bindings was added.

Follow-up for [57514].
Props: gziolo, mukesh27, youknowriad, santosguillamot.
See #60282.



git-svn-id: https://develop.svn.wordpress.org/trunk@57526 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Reset parser state after seeking to bookmark.

When parser states were introduced, nothing in the `seek()` method reset the
parser state. This is problematic because it could leave the parser in the
wrong state.

In this patch the parser state is reset so that it's properly adjusted on
the successive call to `next_token()`.

Developed in https://github.com/WordPress/wordpress-develop/pull/6021
Discussed in https://core.trac.wordpress.org/ticket/60428

Follow-up to [57211]

Props dmsnell, kevin940726
Fixes #60428



git-svn-id: https://develop.svn.wordpress.org/trunk@57527 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Fix typo setting the wrong self-closing flag.

The HTML Processor tracks whether a token was found with the self-closing flag.
Depending on the context, this flag may or may not indicate that the element is
self closing. Unfortunately it's been tracking the wrong flag: it's been tracking
the end-tag flag, which indicates that a token is an end tag.

In this patch the right flag is set in the HTML Processor. This hasn't been an
issue because the HTML Processor doesn't yet read that stored flag, but it's an
important fix to make before adding support for foreign content (SVG and MathML)
since that behavior depends on reading the correct flag.

Follow-up to [56274].

Props dmsnell.



git-svn-id: https://develop.svn.wordpress.org/trunk@57528 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Use strict comparison in `wp-admin/update-core.php`.

Follow-up to [11273], [25784], [54654].

Props wpfy, mukesh27, azaozz, viralsampat.
Fixes #58061, #60415.

git-svn-id: https://develop.svn.wordpress.org/trunk@57529 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Rename the `$ID` parameter to `$post_id` in `trackback()`.

This resolves a few WPCS warnings:
{{{
Variable "$ID" is not in valid snake_case format, try "$i_d"
}}}

See #59650.

git-svn-id: https://develop.svn.wordpress.org/trunk@57530 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Mock plugin API response in `WP_REST_Plugins_Controller_Test`.

Avoid false test failures due to network conditions in the `WP_REST_Plugins_Controller_Test` class. This mocks HTTP responses from the plugin information endpoint for the link-manager plugin.

Props: peterwilsoncc, costdev.
See #59647.



git-svn-id: https://develop.svn.wordpress.org/trunk@57531 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Rename the `$expires_offset` variable in `cache_javascript_headers()`.

This resolves a WPCS warning:
{{{
Variable "$expiresOffset" is not in valid snake_case format, try "$expires_offset"
}}}

Follow-up to [4109], [21996].

See #59650.

git-svn-id: https://develop.svn.wordpress.org/trunk@57532 602fd350-edb4-49c9-b593-d223f7449a82

* Script Loader: Remove unused `WP_Scripts::get_unaliased_deps()` method.

This private method was introduced in [56033] / #12009 but it's not actually used.
It was part of the inline script implementation which was later reverted before final merge.
The method can be safely removed because it’s private and cannot be used by extenders.

Props joemcgill.
Fixes #60438.

git-svn-id: https://develop.svn.wordpress.org/trunk@57533 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Update the `codecov/codecov-action` action.

This updates the `codecov/codecov-action` from version `3.1.5` to `4.0.1`.

Version 4 switches to using the Codecov CLI to upload test report date, and changes the version of Node.js used for the action to 20.x. This fixes the notices currently shown for the test coverage workflow.

Props: mukesh27.
See #59658.

git-svn-id: https://develop.svn.wordpress.org/trunk@57534 602fd350-edb4-49c9-b593-d223f7449a82

* General: Add tests for `array_is_list` polyfill added in r57337.

Props costdev.
See #55105.

git-svn-id: https://develop.svn.wordpress.org/trunk@57535 602fd350-edb4-49c9-b593-d223f7449a82

* Build/Test Tools: Pass a token to the Codecov action.

Version 4 of the action now requires a token to be provided in order to upload coverage results.

Follow up to [57534].

Props swissspidy.
See #59658.

git-svn-id: https://develop.svn.wordpress.org/trunk@57536 602fd350-edb4-49c9-b593-d223f7449a82

* Upload: Fallback to `PclZip` to validate ZIP file uploads.

`ZipArchive` can fail to validate ZIP files correctly and report valid files as invalid. This introduces a fallback to `PclZip` to check validity of files if `ZipArchive` fails them.

This introduces the new function `wp_zip_file_is_valid()` to validate archives.

Follow up to [57388].

Props audunmb, azaozz, britner, cdevroe, colorful-tones, costdev, courane01, endymion00, feastdesignco, halounsbury, jeffpaul, johnbillion, jorbin, jsandtro, karinclimber, kevincoleman, koesper, maartenbelmans, mathewemoore, melcarthus, mujuonly, nerdpressteam, olegfuture, otto42, peterwilsoncc, room34, sayful, schutzsmith, stephencronin, svitlana41319, swissspidy, tnolte, tobiasbg, vikram6, welaunchio.
Fixes #60398.


git-svn-id: https://develop.svn.wordpress.org/trunk@57537 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Rename the `$oSelf` variable in `WP_MatchesMapRegex::apply()`.

This resolves a WPCS warning:
{{{
Variable "$oSelf" is not in valid snake_case format, try "$o_self"
}}}

Follow-up to [11853], [38376].

See #59650.

git-svn-id: https://develop.svn.wordpress.org/trunk@57538 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Introduce the Font Library post types and low level APIs.

This is the first step towards adding the font library to WordPress.
This commit includes the font library and font face CPTs.
It also adds the necessary APIs and classes to register and manipulate font collections.

This PR backports the font library post types and low level APIs to Core. This is the first step to include the font library entirely into Core. Once this merged, we'll open a PR with the necessary REST API controllers.

Props youknowriad, get_dave, grantmkin, swissspidy, hellofromtonya, mukesh27, mcsf.
See #59166.

git-svn-id: https://develop.svn.wordpress.org/trunk@57539 602fd350-edb4-49c9-b593-d223f7449a82

* Editor: Fix Font Library PHP unit tests.

These font assets files used in phpunit tests were missing in the original commit [57539].

Props mukesh27.
See #59166.

git-svn-id: https://develop.svn.wordpress.org/trunk@57540 602fd350-edb4-49c9-b593-d223f7449a82

* Coding Standards: Fix array key alignment after [57539].

See #59166.

git-svn-id: https://develop.svn.wordpress.org/trunk@57541 602fd350-edb4-49c9-b593-d223f7449a82

* HTML API: Join text nodes on invalid-tag-name boundaries.

A fix was introduced to the Tag Processor to ensure that contiguous text
in an HTML document emerges as a single text node spanning the full
sequence. Unfortunately, that patch was marginally over-zealous in
checking if a "<" started a syntax token or not. It used the following:

{{{
<?php
if ( 'A' <= $c && 'z' >= $c ) { ... }
}}}

This was based on the assumption that the A-Z and a-z letters are
contiguous in the ASCII range; they aren't, and there's a gap of
several characters in between. The result of this is that in some
cases the parser created a text boundary when it didn't need to.
Text boundaries can be surprising and can be created when reaching
invalid syntax, HTML comments, and more hidden elements, so
semantically this wasn't a major bug, but it was an aesthetic
challenge.

In this patch the check is properly compared for both upper- and
lower-case variants that could potentially form tag names.

{{{
<?php
if ( ( 'A' <= $c && 'Z' >= $c ) || ( 'a' <= $c && 'z' >= $c ) ) { ... }
}}}

This solves the problem and ensures that contiguous text appears
as a single text node when scanning tokens.

Developed in https://github.com/WordPress/wordpress-develop/pull/6041
Discussed in https://core.trac.wordpress.org/ticket/60…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants