Authors: Aaron Gustafson
This document is intended as a starting point for engaging the community and standards bodies in developing collaborative solutions fit for standardization. As the solutions to problems described in this document progress along the standards-track, we will retain this document as an archive and use this section to keep the community up-to-date with the most current standards venue and content location of future work and discussions.
- This document status: Withdrawn
- Note: We are now directing our efforts into providing feedback on the similar proposal ScrollToTextFragment already being incubated in the W3C Web Incubator Community Group.
URLs have been a fantastic way to enable people to reference content on the web. With the addition of anchor points within Web documents (via name
and id
), authors were empowered to link to specific sections of a document or other documents across the web. Unfortunately, however, that utility requires action on the part of every web page author to include those anchor points. Numerous systems have cropped up to enable automated id
-ing of headings (e.g., Kramdown’s auto_id_prefix
), but they are seldom used. They are also limited to headings only and provide no direct access to flow content.
Enabling users to link directly to arbitrary text content would overcome these current limitations and greatly increase the richness of a link in the context of social sharing, reporting, and legal/scientific/scholarly writing.
In terms of W3C recommendations, we have a handful of extant URL fragment types:
- Named Anchors & Identified Elements - Use of anchor (
a
) elements with aname
attribute (obsolete in HTML5, but still supported for backwards compatibility) and a uniqueid
attribute value on flow content. Both can be referenced as part of a URL via the fragment identifier. - Media Fragments - Media Fragments support complex references in terms of time (the media’s duration) and space (a viewable region of the media composition). This is accomplished via any (or all) of a collection of name-value components:
t
for time,xywh
for spatial data, andtrack
to specify a specific audio or video track. (Anid
component has also been proposed, but is still being discussed due to implementation challenges.) Example:example.com/media#xywh=160,120,320,24&t=10,20&track=audio&track=video
- SVG - SVG supports referencing the
id
of a child element in order to display only that element. SVG also supports ansvgView(…)
functional keyword that enables a link to alter the SVG’s display using one or more semicolon-separated named functions:viewBox(...)
,preserveAspectRatio(...)
,transform(...)
,zoomAndPan(...)
, andviewTarget(...)
. These functions correspond to parameter values for attributes on the rootsvg
element. Example:MyDrawing.svg#svgView(viewBox(0,200,1000,1000))
This is not a new challenge and solutions have been proposed and implemented in other media. It’s worth considering their approaches and their relative applicability in this situation:
- Plain Text Files - A proposal for enabling arbitrary text linking in plain text documents was proposed back in 2005. It involved three modes of fragment creation: positions and ranges, characters and lines, and regular expressions. While interesting, the first two approaches require a document to have a fixed line length or unchanging structure, neither of which is common on the web. The last idea is interesting but not likely to be human readable; that said, it could be a useful next step for this feature.
- EPUB - The EPUB standard supports a very robust means of identifying arbitrary content within. The format is not very human readable (e.g.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3[yyy])
) and is heavily dependent on document structure. Document structure within an ePub is unlikely to change often, making this a far more robust strategy than it would be on the web (where redesigns and dynamic content are common).- Similar approaches have involved CSS selectors and XPath, neither of which is really tailored to the selection of arbitrary text.
Taking a page from the direction Media Fragments have gone, we’re recommending a name-value component in the fragment. This would enable User Agents to disambiguate anchor references from arbitrary text searches. It should avoid collisions with existing id
values in almost all cases.
https://domain.tld/page.html#search=arbitrary%20text%20search
Note: The named keyword could be anything not currently reserved as part of another specification. We recommended sticking to something that is human readable and brief or an abbreviation of a human readable (and familiar) term. Other options include "s" (a common shorthand for "search"), "query", and "q". The question mark (?) could also be used as it is allowed in URLs but should not cause collisions with arguments passed on the GET string as it would exist after the fragment signifying hash (#).
In order to enable linking to arbitrary text, some degree of text search is required. While a phrase or sentence contained within a single element would present few challenges from a "find and highlight" scenario, the following situations should be considered:
- Any combination of one or more of the following:
- Repetition of the same phrase throughout a document;
- A phrase that crosses element boundaries;
- A text phrase that appears in the content with an image in the middle of it.
- Tables of contents where arbitrary text links might link to themselves as well (e.g.
<a href="#search=arbitrary%20text">arbitrary text</a>
) - Sensitivity to capitalization
Every User Agent supports some form of in-page search. These tools are familiar to users and have already addressed many of the challenges presented above. It makes sense for User Agents to tie this sort of feature directly in with their existing search implementation. Therefore, we are recommending that the existence of an arbitrary text search in a URL trigger a UA’s "find in page" function after the page is fully rendered. The arbitrary text fragment should be supplied as the search value.
When it comes to creating links adhering to this API, authors can, of course, create them by hand, but it is much more likely that users will generate these links within their User Agent of choice. As such, User Agents will need to provide one or more mechanisms to generate these URLs when a user has selected text in a page. For example:
- When accessing the context menu with text selected within the current web page, the UA might provide a menu option to "copy a link to this text."
- When invoking a "share" action from within the UA while text is selected, the UA would automatically append the search fragment to the page’s URL.
CSS currently supports the :target
pseudo-class selector, which may seem applicable in this scenario as well, but isn’t actually appropriate. Given that an arbitrary text string may show up more than one time and the user may shift focus between those instances, we are proposing a new CSS pseudo-class selector: :result
.
/* applies to all arbitrary text search results */
:result {
background: #cec;
}
/* applies to the currently focused arbitrary text search results */
:result:focus {
background: #ffa;
}
- What happens if the text contains a
#
(e.g., a hashtag)? - Does generated content get factored in?
- Do things like text transformation, bi-directional, and mixed language modes cause issues for in-page search?
- What are the expectations around how URL encoding/decoding is handled (e.g.,
%20
for space)?
Exploration within the IndieWeb community (which refers to these as "fragmentions") and elsewhere have demonstrated what is possible with client-side implementations of arbitrary text fragments using JavaScript. Some argue that this sort of functionality should exist at the discretion of each site owner, but a strong counterargument is that universal applicability of this functionality would benefit all users and seems in alignment with the Priority of Constituencies.
It’s worth pointing out that there are pros and cons to both approaches:
Pros | Cons | |
---|---|---|
Document | Author controls the experience | Author must define the experience for users to benefit |
Client |
|
Authors may not be able to control the experience |
Kevin Marks and others within the IndieWeb community have explored a variety of approaches for denoting arbitrary text searches in a URL.
The IndieWeb is in favor of re-purposing the existing fragment identifier (a single hash character) with general agreement that spaces (which will need to be encoded) will provide enough differentiation from existing fragment identifiers. To use this approach, any raw hash (#) or whitespace would need to be encoded. It’s worth noting that this approach could lead to some potential confusion for the User Agent when dealing with a link that references both an existing anchor and arbitrary text.
An earlier proposal from the IndieWeb community involved using a double hash prefix. The URL spec does’t allow for hash characters in a fragment, so these links will fail strict validation. HTML now also allows for the hash character to be used in an id attribute, leading to some potential confusion for the User Agent when dealing with a link that references both an existing anchor and arbitrary text.
There has been some discussion over whether white space should be represented as an encoded space (%20) or a plus (+) in the arbitrary fragment. The latter would require additional encoding of any literal plus characters in the arbitrary text.
- Chrome Extension
- JS implementation
- WordPress plugins: wp-fragmention & Fragmentions
- Drupal Plugin
- NY Times
- Save Publishing
- Business Insider
- The Guardian
- Addressable
- Media Fragments - https://www.w3.org/TR/media-frags/
- CSS Selectors (only to elements though) - https://developer.mozilla.org/docs/Web/CSS/CSS_Selectors
- XPath (only to elements though) - https://developer.mozilla.org/docs/Web/XPath
- DOM Range - https://developer.mozilla.org/docs/Web/API/Range
- EPUB Canonical Fragment Identifiers - http://www.idpf.org/epub/linking/cfi/epub-cfi.html
- SVG IRI fragments - https://www.w3.org/TR/SVG/linking.html#LinksIntoSVG
- Web Annotations Selectors - https://www.w3.org/TR/2017/NOTE-selectors-states-20170223/#frags
- Fragment identifiers for plain text files [PDF] - http://dret.net/netdret/docs/wilde-ht2005-textfrag.pdf
- Using URLs to pass parameters to web applications, widgets and gadgets - http://internet-apps.blogspot.in/2007/11/using-urls-to-pass-parameters-to-web.html
- Scroll to Text Fragment