Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: link lcp attribution to image resource and navigation page load #448

Merged
merged 14 commits into from
Oct 2, 2023

Conversation

williazz
Copy link
Contributor

@williazz williazz commented Sep 11, 2023

Summary

LCP can be attributed to other RumEvents.

  1. If LCP resource is an image, then a site owners want to know how that image was loaded
  2. Site owners want to know if LCP is blocked by the page load sequence

Revision 1

  1. Link LCP Attribution to ResourceEvent eventId for image load, if any
  2. Link LCP Attribution to NavigationEntry eventId of page load, if any

Two callouts in revision 1

  1. Caught a bug where EventBus was instantiated twice
  2. Populate resourceEvent with startTime

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@williazz williazz force-pushed the LinkLCPAttr branch 2 times, most recently from b54956b to 931c30c Compare September 11, 2023 20:33
@williazz williazz changed the title chore: link lcp to image resource and nav event chore: link lcp attribution to image resource and navigation page load Sep 11, 2023
@williazz williazz changed the title chore: link lcp attribution to image resource and navigation page load feat: link lcp attribution to image resource and navigation page load Sep 11, 2023
@williazz williazz force-pushed the LinkLCPAttr branch 3 times, most recently from 7f4b711 to 24b9ddc Compare September 11, 2023 23:24
* Creates key to link a RumEvent to the PerformanceEntry that it is sourced from
* e.g. performanceKey(ResourceEvent) === performanceKey(PerformanceResourceTiming)
*/
export const performanceKey = (details: HasLatency) =>
Copy link
Contributor Author

@williazz williazz Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the general purpose of performanceKey() deserves some discussion.

All RumEvents that are sourced from PerformanceEntries share (or will share) fields duration and startTime. These fields are highly unique because they are DOMHighResTimestamps. The only edge case I am worried about for collisions are when startTime or duration is zero, which can happen when resources are cached.

This pattern can also used to link monkey patched HttpEvents to PerformanceResourceTimings where initatorType = Fetch || Xhr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion -- consider adding it to the method's JavaDoc comment for posterity.

startTime or duration is zero... can happen when resources are cached

Does this happen in practice? Do we need some logic in place to prevent conflicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment. It was my initial worry that duration might be zero when resources are cached. But when attempting to produce this, I have not seen duration actually be zero. Also, startTime will still be extremely unique.

@@ -402,7 +402,8 @@ export class Orchestration {
id: applicationId,
version: applicationVersion
},
this.config
this.config,
this.eventBus
Copy link
Contributor Author

@williazz williazz Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callout (1): this was a bug that was included in #445. Currently implementing robust tests to prevent this in the futrue

@@ -120,6 +120,7 @@ export class ResourcePlugin extends InternalPlugin {
const eventData: ResourceEvent = {
version: '1.0.0',
initiatorType: entryData.initiatorType,
startTime: entryData.startTime,
Copy link
Contributor Author

@williazz williazz Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callout (2): startTime needs to be captured in order for ResourceEvent to be linked to its source PerformanceEvent

}
private resourceEventIds = new Map<string, string>();
private navigationEventsIds: string[] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since we only ever use the first, this doesn't need to be an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, changed to a private string var

@ps863
Copy link
Member

ps863 commented Sep 13, 2023

comment: I think the tests here need an update based on these changes

@williazz williazz force-pushed the LinkLCPAttr branch 2 times, most recently from 7efdf50 to b11850a Compare September 14, 2023 07:05
@williazz williazz marked this pull request as ready for review September 14, 2023 07:06
@williazz
Copy link
Contributor Author

williazz commented Sep 14, 2023

comment: I think the tests here need an update based on these changes

It was still in draft state, but made ready for review after adding unit tests.


export const WEB_VITAL_EVENT_PLUGIN_ID = 'web-vitals';

export class WebVitalsPlugin extends InternalPlugin {
constructor() {
super(WEB_VITAL_EVENT_PLUGIN_ID);
this.lcpHelper = this.lcpHelper.bind(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: If you use an arrow function to declare lcpHelper, it will bind lcpHelper to the 'class' as you are explicitly doing here (because arrow functions are bound to the object which declared them).

@@ -34,28 +47,66 @@ export class WebVitalsPlugin extends InternalPlugin {
configure(config: any): void {}

protected onload(): void {
this.context.eventBus.subscribe(Topic.EVENT, this.lcpHelper); // eslint-disable-line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): Why doesn't eslint like this?

const a = (metric as LCPMetricWithAttribution).attribution;
const attribution: any = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Add some type safety by adding an Attribution type.

interface RumLCPAttribution {
    element?: string;
    url?: string;
    timeToFirstByte: number;
    resourceLoadDelay: number;
    resourceLoadTime: number;
    elementRenderDelay: number;
    lcpResourceEntry?: string;
    navigationEntry?: string;
}
Suggested change
const attribution: any = {
const attribution: RumLCPAttribution = {

Comment on lines +85 to +97
onLCP: jest.fn().mockImplementation((callback) => {
context.eventBus.dispatch(Topic.EVENT, imageResourceRumEvent);
context.eventBus.dispatch(Topic.EVENT, navigationRumEvent);
callback(mockLCPDataWithImage);
}),
Copy link
Member

@qhanam qhanam Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): Should we test any integrations? I'd like some evidence that everything works together as intended.

E.g.,

  1. Between nav/resource and web vitals plugin.
  2. Between browser and web vitals plugin.
  3. Between web vitals lib and web vitals plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. FWIW manual testing had no problems so far

disable(): void {} // eslint-disable-line

protected onload(): void {
this.context.eventBus.subscribe(Topic.EVENT, this.subscriber); // eslint-disable-line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't see any lint errors... could we remove this?

Suggested change
this.context.eventBus.subscribe(Topic.EVENT, this.subscriber); // eslint-disable-line
this.context.eventBus.subscribe(Topic.EVENT, this.subscriber);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@typescript-eslint/unbound-method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nvm this is test code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add the specific rule in my next revision where necessary

@@ -34,28 +47,63 @@ export class WebVitalsPlugin extends InternalPlugin {
configure(config: any): void {}

protected onload(): void {
this.context.eventBus.subscribe(Topic.EVENT, this.handleEvent); // eslint-disable-line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of disabling es-lint here, you can fix this warning by (A) using an arrow function to declare handleEvent, or (B) annotating the function with this: WebVitalsPlugin.

@williazz
Copy link
Contributor Author

williazz commented Sep 14, 2023

Problem

There is something about TestCafe that prevents the ResourcePlugin from recording the LCP resource's image. This prevents me from testing the event linking for LCPAttribution.lcpResourceEntry. However, this works every time during manual testing with npm run server

My setup:

// fs
    app/
       ...
       web_vital_event.html
       assets/
           lcp.jpg
// web_vital_event.html
... body {
    ...
    <img src="assets/lcp.jpg" />
}...
// loader-web-vital.js
...
loader('cwr', 'abc123', '1.0', 'us-west-2', './rum_javascript_telemetry.js', {
    dispatchInterval: 0,
    metaDataPluginsToLoad: [],
    eventPluginsToLoad: [
        new ResourcePlugin(),
        new NavigationPlugin(),
        new WebVitalsPlugin()
    ],
    telemetries: [],
    clientBuilder: showRequestClientBuilder
});
...
// WebVitalsPlugin.test.ts (integ)
test('when lcp image resource is recorded then it is attributed to lcp', async (t: TestController) => {
    const browser = t.browser.name;
    if (browser === 'Safari' || browser === 'Firefox') {
        return 'Test is skipped';
    }
    await t.wait(300);

    await t
        // Interact with page to trigger lcp event
        .click(testButton)
        .click(makePageHidden)
        .expect(RESPONSE_STATUS.textContent)
        .eql(STATUS_202.toString())
        .expect(REQUEST_BODY.textContent)
        .contains('BatchId');

    const events = JSON.parse(await REQUEST_BODY.textContent).RumEvents;
    // console.log(events);
    const lcp = events.filter(
        (x: { type: string }) => x.type === LCP_EVENT_TYPE
    )[0];
    const resource = events.filter(
        (x: { details: string; type: string }) =>
            x.type === PERFORMANCE_RESOURCE_EVENT_TYPE &&
            x.details.includes('lcp.jpg')
    )[0];
    await t.expect(lcp.details).contains(`"lcpResourceEntry":"${resource.id}"`);
    // NOTE: lcp.details does include the correct targetUrl
    // Therefore, the PerformanceAPI is working under the hood, but not the ResourcePlugin
});

@williazz
Copy link
Contributor Author

There is something about TestCafe

This was fixed by upgrading test cafe XD

@williazz
Copy link
Contributor Author

williazz commented Sep 20, 2023

This is an anti-pattern, but I have included #451 in this PR because it is required to pass the integ tests. Please merge #451 before merging #448

Edit: #451 has been merged

* Creates key to link a RumEvent to the PerformanceEntry that it is sourced from
* e.g. performanceKey(ResourceEvent) === performanceKey(PerformanceResourceTiming)
*/
export const performanceKey = (details: HasLatency) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion -- consider adding it to the method's JavaDoc comment for posterity.

startTime or duration is zero... can happen when resources are cached

Does this happen in practice? Do we need some logic in place to prevent conflicts?

@williazz
Copy link
Contributor Author

issue: WebVitals plugin needs to know if onLCP is supported before caching image resources to avoid memory leak

@williazz williazz requested a review from ps863 September 25, 2023 18:56
@williazz
Copy link
Contributor Author

williazz commented Sep 25, 2023

issue: WebVitals plugin needs to know if onLCP is supported before caching image resources to avoid memory leak

There are two solutions to this:

  1. Check browser support at runtime.
  2. After 10 seconds, cleanup cache and stop listening for LCP resource candidates.

(1) is explicitly not recommended by by the owners of web-vitals GoogleChrome/web-vitals#37 (comment).

If we implemented (1) anyways, we would check support for PerformanceObserver + PerformancePaintTiming. Some additional analysis is needed to see what other dependencies the web-vitals requires. In the worst case, we never cleanup resourceEventIds and cause a small memory leak. If web-vitals package implements a polyfill for LCP in the future, this would have to be updated to avoid data loss.

(2) looks janky but is the simplest solution and my recommendation. The benefit is that we do not have to worry about many edge cases or do any maintenance work. The risk is that we are setting a hard limit--if it takes longer than ~10 seconds to report LCP, then we do not attribute lcpResourceEntry

@ps863
Copy link
Member

ps863 commented Sep 26, 2023

issue: WebVitals plugin needs to know if onLCP is supported before caching image resources to avoid memory leak

There are two solutions to this:

  1. Check browser support at runtime.
  2. After 10 seconds, cleanup cache and stop listening for LCP resource candidates.

(1) is explicitly not recommended by by the owners of web-vitals GoogleChrome/web-vitals#37 (comment).

If we implemented (1) anyways, we would check support for PerformanceObserver + PerformancePaintTiming. Some additional analysis is needed to see what other dependencies the web-vitals requires. In the worst case, we never cleanup resourceEventIds and cause a small memory leak. If web-vitals package implements a polyfill for LCP in the future, this would have to be updated to avoid data loss.

(2) looks janky but is the simplest solution and my recommendation. The benefit is that we do not have to worry about many edge cases or do any maintenance work. The risk is that we are setting a hard limit--if it takes longer than ~10 seconds to report LCP, then we do not attribute lcpResourceEntry

I think approach 1 has other issues as well:

  • Browser support can change anytime (with new releases of browser versions, etc). This approach would require far more maintenance potentially if you are doing an explicit check.

Question on approach 2:

  • Why 10 seconds? Is this based on a recommendation? What if the time window were smaller?

@williazz
Copy link
Contributor Author

williazz commented Sep 27, 2023

Follow up on offline conversation...

Why 10 seconds? Is this based on a recommendation? What if the time window were smaller?

10 sec is an arbitrary duration where LCP has probably been recorded already, if supported. We could do another time.

We ultimately opted for (1), and should keep an eye on updates to LargestContentfulPaint browser support and web-vitals.

@ps863 ps863 merged commit 4b8506e into aws-observability:main Oct 2, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants