Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage metrics #38

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jegorus
Copy link
Contributor

@jegorus jegorus commented May 4, 2023

Item coverage and num retrieved

@codecov
Copy link

codecov bot commented May 22, 2023

Codecov Report

Merging #38 (4d0f0d0) into main (eee3ba5) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main       #38   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           44        45    +1     
  Lines         2209      2230   +21     
=========================================
+ Hits          2209      2230   +21     
Impacted Files Coverage Δ
rectools/metrics/__init__.py 100.00% <100.00%> (ø)
rectools/metrics/coverage.py 100.00% <100.00%> (ø)

float
Value of metric.
"""
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we need only the items column, let's take only it.

This will be more memory efficient and faster as well

items = reco.loc[reco[Columns.Rank] <= self.k, Columns.Item]

Value of metric.
"""
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k]
return len(reco_k_first_ranks[Columns.Item].unique()) / len(catalog)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nunique method, no need to use len

reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k]
return len(reco_k_first_ranks[Columns.Item].unique()) / len(catalog)

def calc_per_user(self, reco: pd.DataFrame, catalog: Catalog) -> pd.Series:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for this metric calc_per_user method is meaningless since recommendations for user are unique and coverage depends only on k that we set up here

pd.Series
Values of metric (index - user id, values - metric value for every user).
"""
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Values of metric (index - user id, values - metric value for every user).
"""
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k]
return reco_k_first_ranks.groupby(Columns.User)[Columns.Item].count().rename(None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to store results of complex expressions to the separate variables

class NumRetrieved(MetricAtK):
"""
Number of recommendations retrieved is a metric that shows
how much items were recommended to users by first k recommendations (less or equal k)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much -> many

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants