Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Measuring listens instead of downloads #1

Open
daveajones opened this issue Jul 22, 2021 · 5 comments
Open

Proposal: Measuring listens instead of downloads #1

daveajones opened this issue Jul 22, 2021 · 5 comments

Comments

@daveajones
Copy link

daveajones commented Jul 22, 2021

Ditching IP Address Measurement - Rationale

I would like to propose that this repo become the collaboration point for a move from measuring downloads to measuring true listens. Based on the industry-wide move to block tracking, and the recent announcement of Private Relay by Apple, I believe it is time to start designing what comes next when IP addresses become totally unreliable.

It has been said that true listening measurement is either not possible outside of streaming silos, or not feasible because it would require all apps to support it. I think that both of those are untrue. Apps are already implementing Podcasting 2.0 namespace features, which shows that podcast app developers are more than willing to engage and implement new technology. It just has to be in their customers' best interests. Privacy is top of mind for them. So, this is a point where collaboration makes a lot of sense.


The _guid Parameter

As a start, I'd like to propose the "_guid" url parameter be attached, by all apps, to all enclosure downloads. A _guid looks like this:

GET https://example.com/podcast/episode1.mp3?_guid=6975bcb2-32b5-4d16-b002-15a68ada2234

Whenever a listener on a podcast app taps the play button on an episode, a unique sha256 GUID value is created and sent along with the enclosure request. This GUID is then pinned internally within the app to that user, that podcast, that episode. If the user ever listens to that episode again or picks up from a point that they left off earlier, the same GUID parameter is sent along with the enclosure url in the GET request. This GUID statically represents this unique listen by this listener, and always will.

In this way, a unique anonymous value is born the first time a listener plays an episode. That’s a truly unique listen. And, it only gets counted once - whether that listener is at the office, at home, or out in the world. Because it doesn’t rely on an IP address, there is no need for complex IAB IP-range source filtering. The app is in control and ensures the value is reliably delivered.
With a scheme like this, listeners get privacy and podcasters gets truthful listener numbers.


Fraud

Couldn't someone game the numbers by generating fake downloads with unique GUID's? Sure they could. But, that can be done already with IAB downloads. Fraud detection is always part of the game in the world of digital advertising and attribution. The aim isn't to design a system that is un-gameable. That isn't possible. The goal is having a system that's simple and transparent enough to make fraud detection a fairly straightforward process.

But, the larger issue is that, if IP addresses aren't a reliable measurement source in the future, an in-band solution such as this will be the only way to measure. We should accept that, and build safeguards around it instead of wishing it were different.


Larger Effort

I envision this as, hopefully, the beginning of a larger effort to design a standard set of open url parameters and request headers to be used for listener attribution. Formats like the prepended underscore seen above can be implemented to avoid collisions with existing url parameters.

I'm seeing url parameters being used like this already in the wild for attribution. But, they are all proprietary or seem to be web only. An open, industry-wide standard would be the way forward.

@jamescridland
Copy link
Collaborator

Thanks, Dave!

I'd see this being a benefit to all podcasters - it's backwards-compatible (in that it won't break any current podcast host analytics), and can be used to achieve some considerable benefit. However, this doesn't, yet, fix the issue in your header - "measuring listens instead of downloads". The only thing this offers is a more reliable way to calculate reach/cume ("how many people are downloading this podcast?") and an auto-download would be visible to the download server as identical to a real-time listen. That said, any additional certainty would be beneficial.

I'd probably suggest that we don't use something called guid, if only because the podcast namespace also has a guid value which is entirely different, and it would be good not to confuse this episode-level guid with a podcast-level guid. Why don't we go for something like edid (episode-device-id) or idfea (identifier for episode analytics)? Or is there a better term that I'm not aware of?

@daveajones
Copy link
Author

Good point. What if the spec is:

  • Every time the listener hits the play button a HEAD request is attempted with the _guid attached if the file has already been auto-downloaded.
  • Auto-downloads get another parameter added, like “_autodownload=true” to differentiate it from a listen.

This would get closer to the mark by at least having two metrics to merge for a clearer picture of what was what.

@daveajones
Copy link
Author

Actually, there could be an “_action=“ param that’s always included whether it’s a HEAD or GET. The actions can be “listen” or “download”. That would simplify log parsing I think.

@bryanmoffett
Copy link

bryanmoffett commented Aug 16, 2021

Glad to see you pushing this, Dave. It's what we tried to do at NPR in 2017 with the open source Remote Audio Data effort among a dozen or so orgs.

At the time, there were two prevailing thoughts - one, that Apple and others would never adopt it, and two, that even outside Apple, there were a lot of vocal people who felt any kind of tracking around listens was too risky. RAD was designed to use throwaway session variables, too, but that did not seem enough at the time. NPR did implement a proof of concept of the beta spec in NPR One, and it worked.

I will note two things: One, I agree 100% with you that now is the time, and that it is indeed technically possible to give publishers actionable play data they both deserve and need. I thought 2017 was the time, but alas, it was not! And second, that the IAB is also picking up this thread actively.

I think the best chance of success is from both efforts - this one, by the people who really understand the fundamentals, but also aligned with the IAB's efforts, which will bring the buy-side and some platforms into the discussions. The IAB can exert pressure on the industry in the form of standards.

The last thing I'd recommend at this stage is to not talk about listens - (which I've done for years and I think hurt the cause), but rather, focus on play data. That's really what we're after - publishers want to know if humans actually hit the play button on their content, and what parts of the content at that.

@daveajones
Copy link
Author

Thanks for this input Bryan. It's good to hear the history of this.

I think the best chance of success is from both efforts - this one, by the people who really understand the fundamentals, but also aligned with the IAB's efforts, which will bring the buy-side and some platforms into the discussions. The IAB can exert pressure on the industry in the form of standards.

Dan and I will be on "Sounds Profitable" tomorrow to discuss this stuff with Bryan Barletta. I'm hoping to open the broader discussion from that stage since Podcasting 2.0 isn't really concerned with this side of things. It's a personal concern of mine that is probably a separate deal.

The last thing I'd recommend at this stage is to not talk about listens

Also good advice. @jamescridland also has advised this, and I agree. It's not an accurate enough term. The only way it could be is if all plays were streams and downloads didn't exist. But, none of us want that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants