-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage metrics #38
base: main
Are you sure you want to change the base?
Coverage metrics #38
Conversation
Codecov Report
@@ Coverage Diff @@
## main #38 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 44 45 +1
Lines 2209 2230 +21
=========================================
+ Hits 2209 2230 +21
|
float | ||
Value of metric. | ||
""" | ||
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we need only the items column, let's take only it.
This will be more memory efficient and faster as well
items = reco.loc[reco[Columns.Rank] <= self.k, Columns.Item]
Value of metric. | ||
""" | ||
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k] | ||
return len(reco_k_first_ranks[Columns.Item].unique()) / len(catalog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nunique
method, no need to use len
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k] | ||
return len(reco_k_first_ranks[Columns.Item].unique()) / len(catalog) | ||
|
||
def calc_per_user(self, reco: pd.DataFrame, catalog: Catalog) -> pd.Series: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for this metric calc_per_user
method is meaningless since recommendations for user are unique and coverage depends only on k
that we set up here
pd.Series | ||
Values of metric (index - user id, values - metric value for every user). | ||
""" | ||
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Values of metric (index - user id, values - metric value for every user). | ||
""" | ||
reco_k_first_ranks = reco[reco[Columns.Rank] <= self.k] | ||
return reco_k_first_ranks.groupby(Columns.User)[Columns.Item].count().rename(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to store results of complex expressions to the separate variables
class NumRetrieved(MetricAtK): | ||
""" | ||
Number of recommendations retrieved is a metric that shows | ||
how much items were recommended to users by first k recommendations (less or equal k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much
-> many
Item coverage and num retrieved