Skip to content

Commit

Permalink
HTML API: Track spans of text with (offset, length) instead of (start…
Browse files Browse the repository at this point in the history
…, end)

This patch follows-up with earlier design questions around how to represent
spans of strings inside the class. It's relevant now as preparation for #5683.

The mixture of (offset, length) and (start, end) coordinates becomes confusing
at times and all final string operations are performed with the (offset, length)
pair, since these feed into `strlen()`.

In preparation for exposing all tokens within an HTML document this change:
 - Unifies the representation throughout the class.
 - It creates `token_starts_at` to track the start of the current token.
 - It replaces `tag_ends_at` with `token_length` for re-use with other token types.

There should be no functional or behavioral changes in this patch.

For the internal helper classes this patch introduces breaking changes, but those
classes are marked private and should not be used outside of the HTML API itself.
  • Loading branch information
dmsnell committed Dec 10, 2023
1 parent 1850589 commit 0346ceb
Show file tree
Hide file tree
Showing 4 changed files with 133 additions and 67 deletions.
38 changes: 32 additions & 6 deletions src/wp-includes/html-api/class-wp-html-attribute-token.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
*
* @access private
* @since 6.2.0
* @since 6.5.0 Replaced `end` with `length` to more closely match `substr()`.
*
* @see WP_HTML_Tag_Processor
*/
Expand All @@ -23,6 +24,7 @@ class WP_HTML_Attribute_Token {
* Attribute name.
*
* @since 6.2.0
*
* @var string
*/
public $name;
Expand All @@ -31,6 +33,7 @@ class WP_HTML_Attribute_Token {
* Attribute value.
*
* @since 6.2.0
*
* @var int
*/
public $value_starts_at;
Expand All @@ -39,6 +42,7 @@ class WP_HTML_Attribute_Token {
* How many bytes the value occupies in the input HTML.
*
* @since 6.2.0
*
* @var int
*/
public $value_length;
Expand All @@ -47,22 +51,43 @@ class WP_HTML_Attribute_Token {
* The string offset where the attribute name starts.
*
* @since 6.2.0
*
* @var int
*/
public $start;

/**
* The string offset after the attribute value or its name.
* Byte length of text spanning the attribute inside a tag.
*
* This span starts at the first character of the attribute name
* and it ends after one of three cases:
*
* - at the end of the attribute name for boolean attributes.
* - at the end of the value for unquoted attributes.
* - at the final single or double quote for quoted attributes.
*
* Example:
*
* <div class="post">
* ------------ length is 12, including quotes
*
* <input type="checked" checked id="selector">
* ------- length is 6
*
* <a rel=noopener>
* ------------ length is 11
*
* @since 6.5.0 Replaced `end` with `length` to more closely match `substr()`.
*
* @since 6.2.0
* @var int
*/
public $end;
public $length;

/**
* Whether the attribute is a boolean attribute with value `true`.
*
* @since 6.2.0
*
* @var bool
*/
public $is_true;
Expand All @@ -71,20 +96,21 @@ class WP_HTML_Attribute_Token {
* Constructor.
*
* @since 6.2.0
* @since 6.5.0 Replaced `end` with `length` to more closely match `substr()`.
*
* @param string $name Attribute name.
* @param int $value_start Attribute value.
* @param int $value_length Number of bytes attribute value spans.
* @param int $start The string offset where the attribute name starts.
* @param int $end The string offset after the attribute value or its name.
* @param int $length Byte length of the entire attribute name or name and value pair expression.
* @param bool $is_true Whether the attribute is a boolean attribute with true value.
*/
public function __construct( $name, $value_start, $value_length, $start, $end, $is_true ) {
public function __construct( $name, $value_start, $value_length, $start, $length, $is_true ) {
$this->name = $name;
$this->value_starts_at = $value_start;
$this->value_length = $value_length;
$this->start = $start;
$this->end = $end;
$this->length = $length;
$this->is_true = $is_true;
}
}
19 changes: 11 additions & 8 deletions src/wp-includes/html-api/class-wp-html-span.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
*
* @access private
* @since 6.2.0
* @since 6.5.0 Replaced `end` with `length` to more closely align with `substr()`.
*
* @see WP_HTML_Tag_Processor
*/
Expand All @@ -26,28 +27,30 @@ class WP_HTML_Span {
* Byte offset into document where span begins.
*
* @since 6.2.0
*
* @var int
*/
public $start;

/**
* Byte offset into document where span ends.
* Byte length of this span.
*
* @since 6.5.0
*
* @since 6.2.0
* @var int
*/
public $end;
public $length;

/**
* Constructor.
*
* @since 6.2.0
*
* @param int $start Byte offset into document where replacement span begins.
* @param int $end Byte offset into document where replacement span ends.
* @param int $start Byte offset into document where replacement span begins.
* @param int $length Byte length of span.
*/
public function __construct( $start, $end ) {
$this->start = $start;
$this->end = $end;
public function __construct( $start, $length ) {
$this->start = $start;
$this->length = $length;
}
}
Loading

0 comments on commit 0346ceb

Please sign in to comment.