-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correction to M-test with resampling, and addition of new MLL test #268
base: main
Are you sure you want to change the base?
Changes from all commits
65ffe27
c8a2964
eeba927
d71efcf
43d57d2
5f32fbd
3b24128
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,7 @@ | ||
import numpy | ||
import scipy.stats | ||
import scipy.special | ||
# PyCSEP imports | ||
from csep.core import regions | ||
import scipy.stats | ||
|
||
|
||
def sup_dist(cdf1, cdf2): | ||
""" | ||
|
@@ -269,3 +268,61 @@ def get_Kagan_I1_score(forecasts, catalog): | |
I_1[j] = numpy.dot(counts[non_zero_idx], numpy.log2(rate_den[non_zero_idx] / uniform_forecast)) / n_event | ||
|
||
return I_1 | ||
|
||
|
||
def log_d_multinomial(x: numpy.ndarray, size: int, prob: numpy.ndarray): | ||
""" | ||
|
||
Args: | ||
x: | ||
size: | ||
prob: | ||
|
||
Returns: | ||
|
||
""" | ||
return scipy.special.loggamma(size + 1) + numpy.sum( | ||
x * numpy.log(prob) - scipy.special.loggamma(x + 1)) | ||
|
||
|
||
def MLL_score(union_catalog_counts: numpy.ndarray, catalog_counts: numpy.ndarray): | ||
""" | ||
Calculates the modified Multinomial log-likelihood (MLL) score, defined by Serafini et al., | ||
(2024). It is built from a collection catalogs Λ_u and a single catalog Ω | ||
|
||
MLL_score = 2 * log( L(Λ_u + N_u / N_o + Ω + 1) / | ||
[L(Λ_u + N_u / N_o) * L(Ω + 1)] | ||
) | ||
where N_u and N_j are the total number of events in Λ_u and Ω, respectively. | ||
|
||
Args: | ||
union_catalog_counts (numpy.ndarray): | ||
catalog_counts (numpy.ndarray): | ||
|
||
Returns: | ||
The MLL score for the collection of catalogs and | ||
""" | ||
|
||
N_u = numpy.sum(union_catalog_counts) | ||
N_j = numpy.sum(catalog_counts) | ||
events_ratio = N_u / N_j | ||
|
||
union_catalog_counts_mod = union_catalog_counts + events_ratio | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed the named Lambda_U and Lambda_j to this more python variable names (No start with caps). Let me know if u agree :D or have other suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is okay for me |
||
catalog_counts_mod = catalog_counts + 1 | ||
merged_catalog_j = union_catalog_counts_mod + catalog_counts_mod | ||
|
||
pr_merged_cat = merged_catalog_j / numpy.sum(merged_catalog_j) | ||
pr_union_cat = union_catalog_counts_mod / numpy.sum(union_catalog_counts_mod) | ||
pr_cat_j = catalog_counts_mod / numpy.sum(catalog_counts_mod) | ||
|
||
log_lik_merged = log_d_multinomial(x=merged_catalog_j, | ||
size=numpy.sum(merged_catalog_j), | ||
prob=pr_merged_cat) | ||
log_lik_union = log_d_multinomial(x=union_catalog_counts_mod, | ||
size=numpy.sum(union_catalog_counts_mod), | ||
prob=pr_union_cat) | ||
log_like_cat_j = log_d_multinomial(x=catalog_counts_mod, | ||
size=numpy.sum(catalog_counts_mod), | ||
prob=pr_cat_j) | ||
|
||
return 2 * (log_lik_merged - log_lik_union - log_like_cat_j) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be a negative -2?? Or did we flip the score? I'm always confused when flipping. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes it should according to the definition in the manuscript. I'll change that. The minus sign does not change any property of the MLL statistic and it just make it positively or negatively oriented, so no issue changing the definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires docstrings explaining how it was calculated and/or derived from the MS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the multinomial log-likelihood which is given by
log( Γ(size + 1) ) + Σ_i x_i log(prob_i ) - log( Γ(x_i + 1) )