Bayesian p-value

Traditionally, p-value is a frequent concept, not a Bayesian concept.

Since its importance in hypothesis testing, there now exist several different kinds of Bayesian p-value.

The one we introduce here might be technically called posterior predictive p-value, which is quite similar to the old-fashioned p-value.

  • In calculaing the posterior predictive p-value, instead of assuming parameter \(\theta\) of the null distribution f(\(\theta\)) is fixed,
    • we have a distribution for parameter \(\theta\).
  • Therefore, the Bayesian p-value is
\[\int P(X>x|\theta)P(\theta) d\theta\]
  • The advantage of Bayesian p-value is that it incorporates the uncertainty of distribution parameter \(\theta\).
    • More uncertain parameter \(\theta\) is, the less significant the Posterior predictive p-value should be.

An example

For example, we want to compute the p-value of seeing 20 mutation supporting-reads (SR) out of 1000 total reads (Depth) with a mutation(error) rate (\(\epsilon\)) of 0.01 (1%).

  • This can be framed as a classical Binomial distribution problem.
  • The non-Bayesian p-value = \(P(X>19)\), with \(P(x) = \binom{1000}{x} \epsilon^{x} (1-\epsilon)^{1000-x}\).
    • It is 0.00328835, as caculated below.
  • However, in the Bayesian sense, the key parameter, \(\epsilon\) is not fixed and has its own distribution.
    • To simplify matters, we assume \(\epsilon\) follows a right-truncated Normal distribution with mean(\(\mu\))=0.01 and varying \(\sigma\) (standard deviation).
    • As seen in the following table, if \(\sigma\) is very small, Bayesian p-value is identical to the pvalue (non-Bayesian).
    • But as \(\sigma\) increases from 0.00001 to 0.01, the Bayesian p-value is getting bigger (less significant), which makes sense.

Table 1. p-value of seeing 20 mutation supporting reads out of 1000 total reads.

mean \(\epsilon\) stddev of \(\epsilon\) p-value
0.01 0 (non-Bayesian) 0.00328835
0.01 0.00001 0.00328898
0.01 0.0001 0.00495702
0.01 0.001 0.06580617
0.01 0.01 0.20495621

Applications

We in fact observed this phenomenon in our precision oncology data. Different sequencers may have similar mean error rates but the standard deviations of the error rate differ quite significantly. In using Bayesian p-value, we were able to achieve the same specificity across different sequencers.

p-value (non-Bayesian) calculated in Julia.

let
    ϵ_mean = 0.01;
    Depth = 1000; SR = 20
    1 - cdf(Binomial(Depth, ϵ_mean), SR - 1)
end

0.0032883597877274573

Bayesian p-value calculated in Julia

function calc_binomial_bayesian_pvalue(;ϵ_mean = 0.01, ϵ_σ = 0.001, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
    norm_dist = Normal(ϵ_mean, ϵ_σ)
    ϵ_simu_vec = rand(truncated(norm_dist, lower=0.000001, upper=0.3), no_of_samplings)
    # if substitute loglikelihood() for pdf(),  everything has to be adjusted.
    #pdf_vec = [pdf(norm_dist, ϵ) for ϵ in ϵ_simu_vec]
    #weight_vec = pdf_vec ./ sum(pdf_vec)
    bayesian_p_value = 0
    for i in 1:no_of_samplings
        bayesian_p_value += (1 - cdf(Binomial(Depth, ϵ_simu_vec[i]), SR - 1))
    end
    bayesian_p_value/no_of_samplings
end

calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.00001, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.003288984110378643
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.001, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.004957029252920819
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.005, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.06580617306018349
calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.01, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.2049562118743039



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Logit or Logistic Regression/回归?
  • 贝叶斯统计 Statistical Rethinking v2 in Julia 教学视频 第1章
  • 7 Julia correctness issues in "Why I no longer recommend Julia". Now fixed.