Bayesian p-value
Traditionally, p-value is a frequent concept, not a Bayesian concept.
Since its importance in hypothesis testing, there now exist several different kinds of Bayesian p-value.
- Probability of Direction (pd)
- Bayes factors
- Bayesian p-value based on the density at the Maximum A Posteriori (MAP)
The one we introduce here might be technically called posterior predictive p-value, which is quite similar to the old-fashioned p-value.
- In calculaing the posterior predictive p-value, instead of assuming parameter \(\theta\) of the null distribution f(\(\theta\)) is fixed,
- we have a distribution for parameter \(\theta\).
- Therefore, the Bayesian p-value is
- The advantage of Bayesian p-value is that it incorporates the uncertainty of distribution parameter \(\theta\).
- More uncertain parameter \(\theta\) is, the less significant the Posterior predictive p-value should be.
An example
For example, we want to compute the p-value of seeing 20 mutation supporting-reads (SR) out of 1000 total reads (Depth) with a mutation(error) rate (\(\epsilon\)) of 0.01 (1%).
- This can be framed as a classical Binomial distribution problem.
- The non-Bayesian p-value = \(P(X>19)\), with \(P(x) = \binom{1000}{x} \epsilon^{x} (1-\epsilon)^{1000-x}\).
- It is 0.00328835, as caculated below.
- However, in the Bayesian sense, the key parameter, \(\epsilon\) is not fixed and has its own distribution.
- To simplify matters, we assume \(\epsilon\) follows a right-truncated Normal distribution with mean(\(\mu\))=0.01 and varying \(\sigma\) (standard deviation).
- As seen in the following table, if \(\sigma\) is very small, Bayesian p-value is identical to the pvalue (non-Bayesian).
- But as \(\sigma\) increases from 0.00001 to 0.01, the Bayesian p-value is getting bigger (less significant), which makes sense.
Table 1. p-value of seeing 20 mutation supporting reads out of 1000 total reads.
mean \(\epsilon\) | stddev of \(\epsilon\) | p-value |
---|---|---|
0.01 | 0 (non-Bayesian) | 0.00328835 |
0.01 | 0.00001 | 0.00328898 |
0.01 | 0.0001 | 0.00495702 |
0.01 | 0.001 | 0.06580617 |
0.01 | 0.01 | 0.20495621 |
Applications
We in fact observed this phenomenon in our precision oncology data. Different sequencers may have similar mean error rates but the standard deviations of the error rate differ quite significantly. In using Bayesian p-value, we were able to achieve the same specificity across different sequencers.
p-value (non-Bayesian) calculated in Julia.
let
ϵ_mean = 0.01;
Depth = 1000; SR = 20
1 - cdf(Binomial(Depth, ϵ_mean), SR - 1)
end
0.0032883597877274573
Bayesian p-value calculated in Julia
function calc_binomial_bayesian_pvalue(;ϵ_mean = 0.01, ϵ_σ = 0.001, no_of_samplings = 1000,
Depth = 1000, SR = 20)
norm_dist = Normal(ϵ_mean, ϵ_σ)
ϵ_simu_vec = rand(truncated(norm_dist, lower=0.000001, upper=0.3), no_of_samplings)
# if substitute loglikelihood() for pdf(), everything has to be adjusted.
#pdf_vec = [pdf(norm_dist, ϵ) for ϵ in ϵ_simu_vec]
#weight_vec = pdf_vec ./ sum(pdf_vec)
bayesian_p_value = 0
for i in 1:no_of_samplings
bayesian_p_value += (1 - cdf(Binomial(Depth, ϵ_simu_vec[i]), SR - 1))
end
bayesian_p_value/no_of_samplings
end
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.00001, no_of_samplings = 1000,
Depth = 1000, SR = 20)
0.003288984110378643
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.001, no_of_samplings = 1000,
Depth = 1000, SR = 20)
0.004957029252920819
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.005, no_of_samplings = 1000,
Depth = 1000, SR = 20)
0.06580617306018349
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.01, no_of_samplings = 1000,
Depth = 1000, SR = 20)
0.2049562118743039
Enjoy Reading This Article?
Here are some more articles you might like to read next: