title | description | layout | toc | ||||||
---|---|---|---|---|---|---|---|---|---|
Schema & Data Formats |
Superfeedr's schema is standard ATOM, with a few extra items fit into custom namespaces. |
page |
|
This section mostly applies to feed subscriptions. If you're subscribing to arbitrary content resources, we will send you the exact content of the resource to which you've subscribed. Some of the status information is available to non-feeds resources as well.
Whatever the original format (RSS, Atom, or any other namespace) is, the notification that we will send to subscribers will use standard ATOM, as well as a few other namespaces detailed below. We will match as much as we can into this format. The overall goal here is to make it easy for the subscriber to consume a consistent schema.
Upon notifications, when subscribing, or when retrieving a resource's content from Superfeedr, you'll see that it may include the following information. This data is useful for you to know the current state of a resource. Please note that some items my be missing at some point, either because we haven't processed the feed yet, or because they wouldn't be accurate.
<tr>
<td>title</td>
<td></td>
<td>The feed title</td>
</tr>
<tr>
<td>http[@code]</td>
<td> </td>
<td>last HTTP status code, please see <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html">Status Code Definitions</a></td>
</tr>
<tr>
<td>http</td>
<td> </td>
<td>the content of that tag is a more explicit log message for your information</td>
</tr>
<tr>
<td>next_fetch</td>
<td> </td>
<td>the resource will be fetched at most before this time</td>
</tr>
<tr>
<td>period</td>
<td> </td>
<td>the polling frequency in seconds for this resource (at least 60 seconds for feeds and at least 300 seconds for arbitrary content)</td>
</tr>
<tr>
<td>last_fetch</td>
<td> </td>
<td>the last time at which we fetched the resource</td>
</tr>
<tr>
<td>last_parse</td>
<td> </td>
<td>the last time at which we parsed the resource. It happens that we fetch a resource and do not parse it as its content hasn't been modified</td>
</tr>
<tr>
<td>last_maintenance_at</td>
<td> </td>
<td>Each resource inside Superfeedr has a maintenance cycle that we use to detect stale resource, or related resource. We normally run maintenance at most every 24hour for each resource, but this is a low priority task, so it may go beyond this</td>
</tr>
<tr>
<td>entries_count_since_last_maintenance</td>
<td></td>
<td>The number of updates in the resource since we last ran the maintenance script. This is a very good indicator of the verboseness of a resource. You may want to remove resources that are too verbose</td>
</tr>
<tr>
<td id="velocity">velocity</td>
<td></td>
<td>The number of updates during a maintenance cycle (between 24 and 48 hours). More than the absolute number, the magnitude matters. </td>
</tr>
<tr>
<td id="popularity">popularity</td>
<td></td>
<td>Float. Starts at 0 (not popular).The greater the number, the more popular the feed. Popularity is assessed for each feed based on several different signals from the social web, number of clicks, number of subscribers. It also depends on the popularity of the web pages which link to the feed.</td>
</tr>
<tr>
<td id="porn_rank">porn_rank</td>
<td>if available</td>
<td>Betwen 0 and 1. The greater the rank, the greater the chances that the feed publishes only porn content. </td>
</tr>
<tr>
<td id="bozo_rank">bozo_rank</td>
<td>if available</td>
<td>Betwen 0 and 1. The Bozo rank indicates that a feed is probably valid syntactically but likely invalid semantically: feeds with constantly changing unique identifier for new entries will rank high, for example.</td>
</tr>
<tr>
<td id="generated_ids">generated_ids</td>
<td>true</td>
<td>Indicates whether the <code>id</code> for each entry was generated by Superfeedr. If this is missing, you can safely assume that we were able to extract the unique id from the feed themselves.</td>
</tr>
</tbody>
Name | Note | Value |
---|---|---|
status[@feed] | contains the URL of the resource |
Notification entries will have the following form. It is standard ATOM. Please note that an entry might not have all of them.
Here are the components used to build the entries. Please note that they may use specific namespaces.
<tr>
<td>link[@rel]</td>
<td>optional</td>
<td>the type of relation to that parent node (alternate, reply... etc)</td>
</tr>
<tr>
<td>link[@type]</td>
<td>optional</td>
<td>MimeType of the link destination (text/html by default)</td>
</tr>
<tr>
<td>link[@title]</td>
<td>optional</td>
<td>the link title</td>
</tr>
</tbody>
Name | Note | Value |
---|---|---|
link[@href] | the url related to the parent node |
{% prism markup %}
{% endprism %}Name | Note | Value |
---|---|---|
category[@term] | optional, multiple | a keyword related to the entry... (tag, category or topic) |
{% prism markup %} {% endprism %}
Name | Note | Value |
---|---|---|
entry[@point] | optional, multiple | geolocation data. Contains a [georss](http://georss.org/) latitude and longitude. It's either extracted from the story or extrapolated from the content. |
{% prism markup %} 47.597553 -122.15925 {% endprism %}
Name | Note | Value |
---|---|---|
author | optional, multiple | Author information |
name | optional | the author's name (or nickname) |
optional | the author's email address | |
uri | optional | the author's URI |
object-type | optional, multiple | the object type, defined in the ActivityStreams spec |
link | optional, multiple | links (see above). They can include links to the author's profile, to the user's avatar... |
{% prism markup %} John Doe [email protected] http://twitter.com/superfeedr as:object-typehttp://activitystrea.ms/schema/1.0/person</as:object-type>
{% endprism %}Name | Note | Value |
---|---|---|
object | optional, multiple | ActivityStreams |
object-type | optional, multiple | the object type, defined in the ActivityStreams spec |
id | optional | the unique identifier of the object |
title | optional | the title of the object |
published | optional | the publication date (iso8601) of the object |
updated | optional | the updated date (iso8601) of the object |
content | optional | the content of the object |
author | optional, multiple | author information (see above) |
category | optional, multiple | categories (see above) |
link | optional, multiple | links (see above) |
{% prism markup %} as:object-typehttp://gowalla.com/schema/1.0/spot</as:object-type> as:object-typehttp://activitystrea.ms/schema/1.0/place</as:object-type> object-id
<title>Title of the Object</title> 2013-04-20T15:00:40+02:00 2013-04-21T14:00:40+02:00 hello world Second http://domain.tld/second [email protected] {% endprism %}Name | Note | Value |
---|---|---|
verb | optional, multiple | defined in the ActivityStreams spec |
{% prism markup %} as:verbhttp://activitystrea.ms/schema/1.0/post</as:verb> {% endprism %}
Entries may include all the above elements. They also contain specific nodes, listed below.
Name | Note | Value |
---|---|---|
entry[@xml:lang] | optional | The language of the entry. It's either extracted or computed from the content (the longer the content, the more relevant). |
entry[@title] | The title of the entry. | |
entry[@published] | optional | The publication date (iso8601) of the entry. |
entry[@updated] | optional | The last updated date (iso8601) of the entry. |
entry[@content] | optional | The content of the entry. Check the type attribute to determine the mime-type. |
entry[@summary] | optional | The summary of the entry. Check the type attribute to determine the mime-type. |
entry[@source] | optional | The source of the entry. It includes all the available feed level elements, such as the feed title, the feed links, the feed's author(s)... etc. It's extremely useful for track feeds. |
{% prism markup %} domain.tld:09/05/03-1 2013-04-21T14:00:40+02:00 2013-04-21T14:00:40+02:00
<title>Entry published on hour ago</title> Entry published on hour ago when it was shinny outside, but now it's raining