Do not require rel=self for discovery #36

cweiske · 2015-05-26T12:00:55Z

The discovery phase currently requires that a document has two relation links:

rel=hub
rel=self

What is the reason for rel=self?

In my eyes, rel=hub should suffice since rel=self will be the URL itself. It should be made optional.

cc @aaronpk @tantek - http://indiewebcamp.com/irc/2015-03-18#t1426690743557

The text was updated successfully, but these errors were encountered:

themel · 2015-05-26T12:35:42Z

The problem is canonicalization/feed aliasing. Most feeds can be accessed
under many URLs (HTTP vs HTTPS, multiple hostnames, infinite spaces of
ignored query parameters). The publisher can't/won't ping all of them when
there's an update to the feed. The self link is an explicit promise to ping
the self link topic when the feed changes, and this is the topic that
subscribers should use. If we drop the self link requirement, we can either
let subscribers that ended up on a feed via a URL that is not the canonical
wait for updates in vain (bad) or make the hub's job much more difficult
because it needs to understand that a ping to http://example.com/feed.xml
might also affect subscribers to https://example.com/feed.xml?foo=bar. This
fits the overall "center complexity in the hub" design approach, but it
would probably lead to a worse user experience because it's hard to do this
kind of aliasing detection reliably.

I also expect the gains from this simplification to be small since adding
two links to a feed is basically the same amount of work as adding one link.

On Tue, May 26, 2015 at 2:00 PM, Christian Weiske [email protected]
wrote:

The discovery phase
http://pubsubhubbub.github.io/PubSubHubbub/pubsubhubbub-core-0.4.html#discovery
currently requires that a document has two relation links:

rel=hub

rel=self

What is the reason for rel=self?

In my eyes, rel=hub should suffice since rel=self will be the URL itself.
It should be made optional.

cc @aaronpk https://github.com/aaronpk @tantek
https://github.com/tantek

—
Reply to this email directly or view it on GitHub
#36.

cweiske · 2015-05-26T12:49:47Z

Actually, adding the hub link in Apache is a single configuration line only:

Header append Link '<http://phubb.cweiske.de/hub.php>; rel="hub"'

Adding the self URL is difficult because it's a dynamic URL. So it's not the same amount of work; quite the contrary.

I understand the issue about the same file being available under multiple URLs. But if there is no self link, the publisher could have to take care that the URLs are only available under one URL.

tantek · 2015-05-26T18:50:19Z

I agree with not requiring rel=self.

re: canonicalization - there is prior art here we should be re-using, that is, rel=canonical - which is already well deployed and in use.

Thus here is a specific proposal.

Change: Publishers MUST have a rel=self link at their URL ("the URL")
To: Publishers SHOULD have a rel=self link, but MAY instead:

provide a rel=canonical link (which they might have already) OR
assume rel=self same as the URL

Thus consuming code:

looks for a rel=self link, if not found
looks for a rel=canonical link, if not found
uses the current URL

Regarding: "since adding two links to a feed is basically the same amount of work as adding one link." - absolutely not true in experience. Example 1: what @cweiske said. Example 2: watching numerous users try to add the TWO links required for OpenID and screwing one of them up (in contrast to people trivially adding one rel=me link required for IndieAuth).

Basically, requiring two links instead of one for the very common case unnecessarily increases publisher responsibility and fragility of the whole system.

julien51 · 2015-05-29T07:34:42Z

I'm very strongly against this because this would bring one more case of silent failure. There's http vs https, there's also case issues and a bunch of other examples. Feedburner is pretty famous for this and f you subscribed to this URL instead of this one, you'd never get pings.

The worst case is for redirects and in this specific case, the hub has no way of matching the ping-ed URL and the actual feed resource.

Again, this is a particularly bad idea because this will silently fail. A subscriber who subscribes to a URL different from the one that is actually pinged to the hub will never receive notifications, and never be able to tell why (because he cannot know which URL is being pinged). THAT makes the protocol fragile.

I'm all sorry for anyone working with Apache in general, but I don't think it's a good idea to base a spec on the difficulty of implementing something with a specific web server. I believe most web frameworks will make it trivial to add one Link header vs. 2 (or 100).

Now, if the whole debate is to say that "canonical" is better than "self", I'll let you fight around this. We can easily change the spec to tell to subscribers:

Use self if there is one
Use canonical if you can't find one
And to publishers:
put either self of canonical.

romkatv · 2015-05-29T08:50:25Z

On Fri, May 29, 2015 at 9:34 AM, Julien Genestoux [email protected]
wrote:

Feedburner is pretty famous for this and f you subscribed to this URL
http://feeds.feedburner.com/TechCrunch/ instead of this one
http://feeds.feedburner.com/Techcrunch/, you'd never get pings.

Minor correction: subscribing to any of these will work:

http://feeds.feedburner.com/*Techcrunch*
http://feeds.feedburner.com/*TechCrunch*
https://feeds.feedburner.com/Techcrunch
http://feedproxy.google.com http://feedproxy.google.com/Techcrunch
etc.

This doesn't invalidate the point Julien is making. Topic aliasing is a
real problem. Correct self links are vital for ensuring that subscribers
are listening to the exact topics that the publisher is pinging.

Roman.

julien51 · 2015-05-29T08:55:31Z

I stand corrected, but that was a large painpoint for along time. I'm glad you guys fixed it :)

pfefferle mentioned this issue Nov 23, 2016

Drop MUST requirement for rel=self w3c/websub#69

Closed

voxpelli mentioned this issue Apr 13, 2017

Why is rel=self mandatory? w3c/websub#101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not require rel=self for discovery #36

Do not require rel=self for discovery #36

cweiske commented May 26, 2015

themel commented May 26, 2015

cweiske commented May 26, 2015

tantek commented May 26, 2015

julien51 commented May 29, 2015

romkatv commented May 29, 2015

julien51 commented May 29, 2015

Do not require rel=self for discovery #36

Do not require rel=self for discovery #36

Comments

cweiske commented May 26, 2015

themel commented May 26, 2015

cweiske commented May 26, 2015

tantek commented May 26, 2015

julien51 commented May 29, 2015

romkatv commented May 29, 2015

julien51 commented May 29, 2015