You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This relates to #997. There's no way (at least not in Orange) to get a table with the keywords per cluster. They can only be obtained in the graphical representation of the Annotated Corpus Map. Sometimes you' d like to present the characteristic keywords per cluster in a table - for instance together with other information about the cluster that can easily be obtained with Group By > Cluster (e.g., the number of documents in each cluster).
The help information says "FDR Threshold sets the threshold for selecting a keyword as a cluster's keyword", but only the uncorrected p values per word are available in the Scores output.
Describe the solution you'd like
Replace the uncorrected keywords by the corrected ones. Or even better, provide some form of output that provides the keywords per cluster right away (e.g., a table with the columns Cluster, Keyword 1, Keyword2, ... Keyword5). Because, even if you have the corrected p values, I don't see a way of getting such a table, especially not if the number of clusters isn't kept constant (varying the number of clusters to see the effects)
Describe alternatives you've considered
None are known to me.
The text was updated successfully, but these errors were encountered:
wvdvegte
changed the title
Annotated corpus map: provide correctedinstead of uncorrected p-value in Scores output
Annotated corpus map: provide corrected instead of uncorrected p-value in Scores output
Jul 22, 2024
I now realized that actually the p-values in the Scores output are the corrected ones, a.k.a. FDRs. But it remains confusing: why refer to FDR in the menu where the threshold is set and simply call them p-value in the Scores output? I suggest the same term is used (either 'FDR' or 'corrected p-value') where the same number is meant.
Is your feature request related to a problem? Please describe.
This relates to #997. There's no way (at least not in Orange) to get a table with the keywords per cluster. They can only be obtained in the graphical representation of the Annotated Corpus Map. Sometimes you' d like to present the characteristic keywords per cluster in a table - for instance together with other information about the cluster that can easily be obtained with Group By > Cluster (e.g., the number of documents in each cluster).
The help information says "FDR Threshold sets the threshold for selecting a keyword as a cluster's keyword", but only the uncorrected p values per word are available in the Scores output.
Describe the solution you'd like
Replace the uncorrected keywords by the corrected ones. Or even better, provide some form of output that provides the keywords per cluster right away (e.g., a table with the columns Cluster, Keyword 1, Keyword2, ... Keyword5). Because, even if you have the corrected p values, I don't see a way of getting such a table, especially not if the number of clusters isn't kept constant (varying the number of clusters to see the effects)
Describe alternatives you've considered
None are known to me.
The text was updated successfully, but these errors were encountered: