Skip to content

Latest commit

 

History

History
365 lines (290 loc) · 12.3 KB

trackers.md

File metadata and controls

365 lines (290 loc) · 12.3 KB
title description layout toc
Trackers
Superfeedr allows you to track entries for a certain query, as opposed to single feeds. Here's how to set it up.
page
Introduction Building Track Feeds Testing Scope
Queries Site Link Language Popularity Porn filtering Bozo Filtering

Introduction

You may want to subscribe transversally to any entry that matches a certain query, rather than subscribing to single feeds. The most common use case is to subscribe to any entry which matches a given keyword.

Only the "Tracker" users can use track feeds. However, the API calls for subscribing, unsubscribing, listing or retrieving past entries are the same that "Subscriber" users can use for regular feeds, using our PubSubHubbub API.

Building Track Feeds

Track feeds are virtual feeds in a sense that they're generated on the fly as long as their URL matches the following criteria:

  • Can be http or https
  • Uses the track.superfeedr.com hostname
  • Has a query query string param whose value is the query itself.

Here's an example of track feed: http://track.superfeedr.com/?query=superfeedr. You can also use a format query string with atom or json as values. You should refer to our schema section for details on both ATOM and JSON.

Queries

Queries are the equivalent of search queries. They contain several members separated by spaces. Each member is either a keyword or a flag with an associated value. Spaces are interpreted as AND.

Some special characters are also supported:

  • + signifies AND operation
  • - negates the following member
  • | signifies OR operation
  • " (quote) wraps a number of tokens to signify a phrase for searching
  • ( and ) indicate precedence

Here are valid examples of queries:

  <tr>
    <td><code>paris (texas | france)</code></td>
    <td>will match any mention of <code>paris</code> along with <code>texas</code> or <code>france</code>.</td>
  </tr>

  <tr>
    <td><code>"new york"</code></td>
    <td>will match any mention of <code>new york</code> (with <code>new</code> and <code>york</code> being consecutive.)</td>
  </tr>
  <tr>
    <td><code>"AAPL"</code></td>
    <td>will match any mention of <code>AAPL</code> but not <code>aapl</code>.</td>
  </tr>

</tbody>
Query Details
superfeedr will match any mention of superfeedr
laurel hardy will match any menion of laurel AND hardy in the same document. Both words can be appart.
romeo -juliette will match any mention of romeo that does not have a mention of juliette

Site

The site: flag allows you to define from which site the content must have been published. The value needs to be any domain or subdomain and will match the host from which the content has been published.

The value is a hostname (full domain or subdomain) of the publishing site. You can add at most one site: per query.

Examples:

  <tr>
    <td><code>superfeedr site:techmeme.com</code></td>
    <td>will match any mention of <code>superfeedr</code> published on Techmeme</td>
  </tr>

</tbody>
Query Details
pubsubhubbub site:blog.superfeedr.com will match any mention of pubsubhubbub published on our blog.

You can use the negation of this flag by using -site: as a flag. In this case, however, you can use several -site: flags.

This is useful when refining tracking feeds for which a lot of content is coming from the same sources.

Query Details
apple -site:techmeme.com -site:techcrunch.com will match any mention of apple unless they're from either Techmeme or Techcrunch.

Link

The link: flag allows you to select only the documents which include a link to a a specific page, or a domain.

You can add at most one link: per query.

  <tr>
    <td><code>link:runscope.com</code></td>
    <td>Only entries with a link to any page with the <code>runscope.com</code> domains.</td>
  </tr>
</tbody>
Query Details
link:https://superfeedr.com/ Only entries with a link to our home page.

Similarly to site, You can use the negation of this flag by using -link: as a flag. You can have multiple -link: values.

Query Details
pubsubhubbub -link:superfeedr.com -link:google.com will match any mention of pubsubhubbub unless it points to either superfeedr.com or google.com.

Language

Superfeedr is able to extract the language of every entry individually. This means you can filter entries matching a specific language or excluse those from specific languages using -language. A given entry cannot have more than one language, which mean you can't use more than one language operand. However, you can exclude multiple languages using multiple -language operands. The value should be the 2 letter value of the language using ISO_639-1.

Please note that in some cases, we are unable to extract the language (not enough test, contradicting text with combination of 2 languages... etc).

  <tr>
    <td><code>-language:it</code></td>
    <td>will exclude entries which use the italian language. If the language can't be determined, the entries will not match.</td>
  </tr>

</tbody>
Query Details
language:en will match only entries explicitly using the english language. If we are unable to extract the language, this won't match.

Popularity

Each feed going through Superfeedr has a popularity ranking. This popularity is a combination of multiple signals and factors: some of them internal and others external (social networks, pagerank... etc). Any feed's popularity evolves slowly. You can build filters which take the popularity of the source into account and exclude content coming from unpopular sources.

The value should be a range (> or <) to match popularity greater or small than a specific value. Check this blog post to learn about the distribution of these ranges.

</tbody>
Query Details
popularity:>3 will match only entries published in feeds with a popularity greater than 3.

Porn Filtering

By default, Superfeedr tries to identify porn content and will filter it out of matching requests. Please note that this algorithm "learns" from any given feed before it can start to classify an entry as porn, which means that if the exclusion of porn content is an absolute requirement, you should also implement filtering on your side.

We consider any feed as porn with a porn rank higher than 0.2.

That said, for some cases, (building porn filters for example!), it makes sense to disable our porn filter. You can achieve this by adding porn:ok to your query.

</tbody>
Query Details
porn:ok plug Will send any mention of plug, whether they are from porn feeds or not.

Bozo Filtering

Similar to porn filtering, Superfeedr is able to filter out matching entries from feeds we consider broken or spammy. For example, some feeds will generate infinite amounts of data using a random id for each new entry.

We consider any feed as bozo with a bozo rank higher than 0.3.

You can disable this filtering by using bozo:ok.

Query Details
bozo:ok ham Will send any mention of ham, whether they are from spam feeds or not.

Testing

Tracking feeds are prospective, which means that you should use them to receive upcoming entries that match them. Because of that, it's not always simple to refine search queries because you have to wait for matches to improve them.

Superfeedr offers a search API which lets you match your tracking feeds queries against historical data. You can then quickly identify how to refine your queries for tracking feeds.

POST https://push.superfeedr.com
Parameter Name Note Value
hub.mode required search
query required The query you want to match. Please see previous sections on how to build search queries.
format optional json or atom (default).

Example

{% prism markup %} curl https://push.superfeedr.com/ -X POST -u demo:demo -d'hub.mode=search' -d'query=superfeedr' {% endprism %}

Response

Superfeedr will return 200 with the corresponding representation of the search results matching your query. Please, refer to our schema section for details.

If you receive a 422 HTTP response, please check the body, as it will include the reason for the subscription failure.

Other HTTP response codes are outlined in the HTTP spec.

Scope

The scope for tracking feeds is the total number of feeds processed by Superfeedr. This includes feeds subscribed by subscribers and feeds published by publishers subscribed by at least one subscriber on their hubs.

We are working on extending this coverage to include any subscribed feed on the web.