Traditionally, p-value is a frequentist concept, not a Bayesian concept.

Since its importance in hypothesis testing, there now exist several different kinds of Bayesian p-value.

The one we introduce here might be technically called posterior predictive p-value, which is quite similar to the old-fashioned p-value.

In calculaing the posterior predictive p-value, instead of assuming parameter \(\theta\) of the null distribution f(\(\theta\)) is fixed,
- we have a distribution for parameter \(\theta\).
Therefore, the Bayesian p-value is

\[\int P(X>x|\theta)P(\theta) d\theta\]

The advantage of Bayesian p-value is that it incorporates the uncertainty of distribution parameter \(\theta\).
- Generally, the more uncertain parameter \(\theta\) is, the less significant the Bayesian p-value should be.

An example

For example, we want to compute the p-value of seeing 20 mutation-supporting reads (SR) out of 1000 total reads (Depth) with a mutation(error) rate (\(\epsilon\)) of 0.01 (1%).

This can be framed as a classical Binomial distribution problem.
The non-Bayesian p-value = \(P(X>19)\), with \(P(x) = \binom{1000}{x} \epsilon^{x} (1-\epsilon)^{1000-x}\).
- It is 0.00328835, as caculated below.
However, in the Bayesian sense, the key parameter, \(\epsilon\) is not fixed and has its own distribution.
- To simplify matters, we assume \(\epsilon\) follows a right-truncated Normal distribution with mean(\(\mu\))=0.01 and varying \(\sigma\) (standard deviation).
- As seen in the following table, if \(\sigma\) is very small, Bayesian p-value is identical to the pvalue (non-Bayesian).
- But as \(\sigma\) increases from 0.00001 to 0.01, the Bayesian p-value is getting bigger (less significant), which makes sense.

Table 1. p-value of seeing 20 mutation supporting reads out of 1000 total reads.

mean \(\epsilon\)	stddev of \(\epsilon\)	p-value
0.01	0 (non-Bayesian)	0.00328835
0.01	0.00001	0.00328898
0.01	0.0001	0.00495702
0.01	0.001	0.06580617
0.01	0.01	0.20495621

Conclusions

We in fact observed this phenomenon in our precision oncology data. Different sequencers may have similar mean error rates but the standard deviations of the error rate differ quite significantly. In using Bayesian p-value, we were able to achieve the same specificity across different sequencers.

Appendix

A.1 p-value (non-Bayesian) calculated in Julia.

let
    ϵ_mean = 0.01;
    Depth = 1000; SR = 20
    1 - cdf(Binomial(Depth, ϵ_mean), SR - 1)
end

0.0032883597877274573

A.2 Bayesian p-value calculated in Julia

function calc_binomial_bayesian_pvalue(;ϵ_mean = 0.01, ϵ_σ = 0.001, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
    norm_dist = Normal(ϵ_mean, ϵ_σ)
    ϵ_simu_vec = rand(truncated(norm_dist, lower=0.000001, upper=0.3), no_of_samplings)
    # if substitute loglikelihood() for pdf(),  everything has to be adjusted.
    #pdf_vec = [pdf(norm_dist, ϵ) for ϵ in ϵ_simu_vec]
    #weight_vec = pdf_vec ./ sum(pdf_vec)
    bayesian_p_value = 0
    for i in 1:no_of_samplings
        bayesian_p_value += (1 - cdf(Binomial(Depth, ϵ_simu_vec[i]), SR - 1))
    end
    bayesian_p_value/no_of_samplings
end

calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.00001, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.003288984110378643

calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.001, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.004957029252920819

calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.005, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.06580617306018349

calc_binomial_bayesian_pvalue(ϵ_mean = 0.01, ϵ_σ = 0.01, no_of_samplings = 1000, 
    Depth = 1000, SR = 20)
0.2049562118743039

Bayesian p-value/贝叶斯p-value