-
Notifications
You must be signed in to change notification settings - Fork 1
Understanding the p values
Fisher's test determines if there are associations between two categorical variables.
Scoary2 uses Fisher's test to remove uncorrelated genes from the analysis. This avoids having to run the computationally intensive post-hoc test for all genes.
Because many genes are tested, Fisher's test p-value is adjusted for multiple testing, resulting in a q-value.
Advantages
Simple and fast test that measures how strongly a gene and a trait correlate.
Disadvantages
Importantly, one of the assumptions of Fisher's test is violated in mGWAS: Each isolate does not have a random and independently distributed probability of exhibiting each state. (Why? See Brynildsrud, 2016, Figure 2.)
For this reason, the resulting p-value is not conservative enough and cannot be interpreted straightforwardly: Fisher's test will likely find too many significant associations. To deal with this, Scoary includes pairwise comparisons post-hoc tests, see below.
Further reading
Wikipedia, fast-fisher Python library
In contrast to Fisher's test, the pairwise comparisons test takes population structure into account: Instead of considering each isolate as an independent sample, this test focuses on evolutionary transitions. The goal is to find the maximum number of phylogenetically non-intersecting pairs of isolates that contrast in the state of both genotype and phenotype. See Brynildsrud, 2016.
Calculation
There are many ways of "picking" nodes in the tree as evolutionary transitions. Scoary2 implements two extreme solutions: the "best", most optimistic and the "worst", most pessimistic picking.
Let
- A "best" picking is one with
$c$ contrasting pairs where as many as possible ($b$ ) support the hypothesis. - A "worst" picking is one with
$c$ contrasting pairs where as many as possible ($w$ ) contradict the hypothesis.
The null hypothesis
The p-values are calculated using the binomial test:
- "best" p-value:
$binom\_test(b,\ n=c)$ - "worst" p-value:
$binom\_test(w,\ n=c)$
Advantages
Takes population structure into account: the focus is not on mere correlation as in Fisher's test, but on evolutionary transitions.
Makes very few assumptions on the evolutionary process.
Disadvantages
It is not clear how to interpret these p-values or the range between them (Brynildsrud, 2016). To get a more readily interpretable p-value, read the section on the permutation test below.
The pairwise comparisons test is arguably too conservative because it only takes into account a fraction of the available data. (Maddison, 2014, Felsenstein, 1985, Grafen, 1996)
Note
Scoary2's parameter worst_cutoff
allows one to skip traits where no gene has a "worst" p-value lower than a
certain threshold, i.e. traits that very strongly correlate with the phylogeny. This can greatly speed up the analysis
of datasets with very many traits and strong population structure effects (for example: multiple species).
Further reading
Read & Nee, 1995, Maddison, 2000, Brynildsrud, 2016
The goal here is to calculate a p-value based on pairwise comparisons that is more readily interpretable than the "best" / "worst" p-values described above.
The idea is to make
If
Advantages
Same advantages as Best / worst pairwise comparisons, but the p-value is readily interpretable:
- Given the phylogeny and the distribution of the gene, it measures how likely one is to find a random
trait that causes at least so many
$b$ per$c$ . - If the p-value is high, the trait strongly correlates with the phylogeny and its association with the gene is more likely to be spurious
- If the p-value is low, the trait weakly correlates with the phylogeny, lending more credence to the hypothesis that the gene is causally linked to the trait.
Disadvantages
Permutation tests are computationally intensive.
The pairwise comparisons test is arguably too conservative because it only takes into account a fraction of the available data. (Maddison, 2014, Felsenstein, 1985, Grafen, 1996)
Further reading
This is not a p-value and cannot be interpreted as such. It is merely used as an empirically-derived score in Scoary2 to sort the genes by how promising they are.
To illustrate what these p-values mean, let's consider these two examples:
- blue:
g+t+
(gene present, trait present) - green:
g-t-
(gene absent, trait absent)
Example 1
- Fisher's test: 0.00016
-
$c$ : 8 (eight non-intersecting contrasting pairs can be made) -
$b$ : 8 (all of them support the hypothesis:g-t-
|g+t+
) -
$w$ : 0 (none contradict the hypothesis:g-t+
|g+t-
) - "best" p-value: 0.0078
- "worst" p-value: 0.0078
- permuted p-value: 0.083
Here, the trait and the gene strongly correlate (Fisher's test) and that the correlation does not come from the population structure (best/worst/permutet p-values).
Example 2
- Fisher's test: 0.00016
-
$c$ : 1 (only one contrasting pair possible: at the root node) -
$b$ : 1 (this pair supports the hypothesis) -
$w$ : 0 (none contradict the hypothesis:g-t+
|g+t-
) - "best" p-value: 1
- "worst" p-value: 1
- permuted p-value: 1
Here, the trait and the gene correlate as strongly as in example 1 (Fisher's test), but there is only one evolutionary transition. We conclude that the data constitutes weak evidence for causality, as illustrated by the best/worst/permutet p-values.