Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output #1077

wvdvegte · 2024-07-22T13:54:35Z

Is your feature request related to a problem? Please describe.
This relates to #997. There's no way (at least not in Orange) to get a table with the keywords per cluster. They can only be obtained in the graphical representation of the Annotated Corpus Map. Sometimes you' d like to present the characteristic keywords per cluster in a table - for instance together with other information about the cluster that can easily be obtained with Group By > Cluster (e.g., the number of documents in each cluster).
The help information says "FDR Threshold sets the threshold for selecting a keyword as a cluster's keyword", but only the uncorrected p values per word are available in the Scores output.

Describe the solution you'd like
Replace the uncorrected keywords by the corrected ones. Or even better, provide some form of output that provides the keywords per cluster right away (e.g., a table with the columns Cluster, Keyword 1, Keyword2, ... Keyword5). Because, even if you have the corrected p values, I don't see a way of getting such a table, especially not if the number of clusters isn't kept constant (varying the number of clusters to see the effects)

Describe alternatives you've considered
None are known to me.

wvdvegte · 2024-08-12T09:07:11Z

I now realized that actually the p-values in the Scores output are the corrected ones, a.k.a. FDRs. But it remains confusing: why refer to FDR in the menu where the threshold is set and simply call them p-value in the Scores output? I suggest the same term is used (either 'FDR' or 'corrected p-value') where the same number is meant.

wvdvegte changed the title ~~Annotated corpus map: provide correctedinstead of uncorrected p-value in Scores output~~ Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output Jul 22, 2024

wvdvegte mentioned this issue Aug 22, 2024

Annotated Corpus Map: suggestion for more meaningful Scores output #1079

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output #1077

Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output #1077

wvdvegte commented Jul 22, 2024

wvdvegte commented Aug 12, 2024

Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output #1077

Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output #1077

Comments

wvdvegte commented Jul 22, 2024

wvdvegte commented Aug 12, 2024