The evaluation measure reflects typical use cases at XING: users are presented with their top-k personalized recommendations, and a user interaction with one of the top-k is counted as a success.
The task of the challenge is to compute 30 recommendations (or less) for each of the 150,000 target users. In particular, the algorithms have to predict those items that a user will interact with. If you interpret the problem as a ranking problem then you could phrase the task as follows: given a user, the recommender algorithm has to compute a ranked list of items so that those items which the user is more likely to interact with appear at the top of the ranking.
The algorithms can therefore exploit all the training data. The submission system measures the quality of the recommendations, which were generated by
your algorithm for each of the target users, based on the scoring function score(S, T)
. Here, S
is the solution
that you are uploading to the system and T
is the test set that describes which items
are relevant to which user. T
is, of course, not accessible to the teams.
Input:
- Test set:
T
= The test set consists of 150,000(user, relevantItems)
-tuples. For a given user (contained in the target users),relevantItems
is the set of items that are considered as relevant:- Relevant items are those items on which the user performed a "positive interaction" in the test period.
- We count clicks, bookmarks and clicks on the reply button as positive interaction.
- Solution:
S
= The task is to compile for each user an ordered list of items/recommendations that has high overlap with therelevantItems
for that user. Teams should predict up to 30 items for each of the 150k target users.S(u) = recommendedItems
= ordered list of items that are recommended to useru
(may be empty if no items are recommended to the user)
score(S, T):
//main scoring function:
function score(S, T) = {
score = 0.0
foreach (u, t) in T: //t = set of relevant items for user u
r = S(u) //r = ordered list of recommended items for user u
score += 20 * (precisionAtK(r, t, 2) + precisionAtK(r, t, 4) + recall(r, t) + userSuccess(r, t))
+ 10 * (precisionAtK(r, t, 6) + precisionAtK(r, t, 20))
return score
}
//precision within the first top k items:
function precisionAtK(recommendedItems, relevantItem, k) = {
topK = recommendedItems.take(k) //takes first k items from the list of reccommendedItems
return intersect(topK, relevantItems).size / k
}
//recall = fraction of relevant, retrieved items (30 items
//are allowed to be submitted at maximum per user):
function recall(recommendedItems, relevantItem) = {
if (relevantItems.size > 0)
return intersect(recommendedItems.take(30), relevantItems).size / relevantItems.size
else
return 0.0
}
//user success = was at least one relevant item recommended for a given user?
function userSuccess(recommendedItems, relevantItem) = {
if (intersect(recommendedItems.take(30), relevantItems) > 0)
1.0
else
0.0
}
Additional remarks:
intersect(r, t)
returns the intersection between the (top k) list of recommended itemsr
and the set of relevant itemst
intersect(r, t).size
returns the size of the intersection- For each user, one can thus - at maximum - earn 100 points (given that there exist 20 relevant items for the user).