vignettes/jointErrorRateCalibration_simulations.Rmd
jointErrorRateCalibration_simulations.Rmd
This vignettes illustrates the following points
We start with two-sample tests because we believe they are used more frequently.
library("sanssouci")
#set.seed(0xBEEF) # for reproducibility
library("ggplot2")
Parameters:
m <- 5e2
n <- 30
pi0 <- 0.8
rho <- 0.3
SNR <- 3
We use the function SansSouciSim
to generate Gaussian equi-correlated samples. We then ‘fit’ the resulting SansSouci object
obj <- SansSouciSim(m, rho, n, pi0, SNR = SNR, prob = 0.5)
obj
## 'SansSouci' object:
## Number of hypotheses: 500
## Number of observations: 30
## 2-sample data
##
## Truth:
## 100 false null hypotheses (signals) out of 500 (pi0=0.8)
We perform JER calibration using the linear (Simes) template \(\mathfrak{R(\lambda)}=(R_1(\lambda),\) \(\ldots,R_K(\lambda)\), where for \(1 \leq k \leq K\)
\[R_k(\lambda) = \left\{i\in \{1, \dots , m\}\::\: \bar{\Phi}(Z_i) > \frac{\lambda k}{m} \right\}\,\]
where \(\bar{\Phi}\) is the cdf of the \(\mathcal{N}(0,1)\) distribution. Note that \(p_i = \bar{\Phi}(Z_i)\) is the one-sided \(p\)-value associated to the test statistics \(Z_i\).
B <- 1000
alpha <- 0.2
cal <- fit(obj, alpha = alpha, B = B, family = "Simes")
cal
## 'SansSouci' object:
## Number of hypotheses: 500
## Number of observations: 30
## 2-sample data
##
## Truth:
## 100 false null hypotheses (signals) out of 500 (pi0=0.8)
## Parameters:
## Test function: rowWelchTests
## Number of permutations: B=1000
## Significance level: alpha=0.2
## Reference family: Simes
## (of size: K=500)
##
## Output:
## Calibration parameter: lambda=0.269668
The output of the calibration is the following SansSouci
object. In particular:
cal$input
contains the input simulated data (and the associated truth)cal$param
contains the calibration parameterscal$output
contains the output of the calibration, including
p.values
: \(m\) test statistics calculated by permutationthr
: A JER-controlling family of \(K\) (here \(K=m\)) elementslambda
: the \(\lambda\)-calibration parameterBecause we are under positive equi-correlation, we expect \(\lambda > \alpha\) for the Simes family.
cal$output$lambda
## 20%
## 0.269668
cal$output$lambda > alpha
## 20%
## TRUE
The fitted SansSouci
object contains post hoc confidence bounds:
## x label stat bound
## 1 1 Simes TP 1
## 2 2 Simes TP 2
## 3 3 Simes TP 3
## 4 4 Simes TP 4
## 5 5 Simes TP 5
## 6 6 Simes TP 6
We compare it to the true number of false positives among the most significant items, and to the Simes bound without calibration
cal0 <- fit(obj, alpha = alpha, B = 0, family = "Simes")
oracle <- fit(obj, alpha = alpha, family = "Oracle")
confs <- list(Simes = predict(cal0, all = TRUE),
"Simes+calibration" = predict(cal, all = TRUE),
"Oracle" = predict(oracle, all = TRUE))
plotConfCurve(confs, xmax = 200)
The code is identical, except for the line to generate the observations (where we do not specify a probability of belonging to one of the two populations using the prob
argument); moreover it is not necessary to specify a vector of categories ‘categ’ in ‘calibrateJER’.
obj <- SansSouciSim(m, rho, n, pi0, SNR = SNR)
obj
## 'SansSouci' object:
## Number of hypotheses: 500
## Number of observations: 30
## 1-sample data
##
## Truth:
## 100 false null hypotheses (signals) out of 500 (pi0=0.8)
cal <- fit(obj, alpha = alpha, B = B, family = "Simes")
cal
## 'SansSouci' object:
## Number of hypotheses: 500
## Number of observations: 30
## 1-sample data
##
## Truth:
## 100 false null hypotheses (signals) out of 500 (pi0=0.8)
## Parameters:
## Test function: rowZTests
## Number of permutations: B=1000
## Significance level: alpha=0.2
## Reference family: Simes
## (of size: K=500)
##
## Output:
## Calibration parameter: lambda=0.2749622
Again we expect \(\lambda > \alpha\).
The associated confidence curves are displayed below:
cal0 <- fit(obj, alpha = alpha, B = 0, family = "Simes")
oracle <- fit(obj, alpha = alpha, family = "Oracle")
confs <- list(Simes = predict(cal0, all = TRUE),
"Simes+calibration" = predict(cal, all = TRUE),
"Oracle" = predict(oracle, all = TRUE))
plotConfCurve(confs)
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.4.0 sanssouci_0.12.8
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 highr_0.9 pillar_1.8.1
## [4] bslib_0.4.1 compiler_4.2.2 jquerylib_0.1.4
## [7] tools_4.2.2 digest_0.6.30 tibble_3.1.8
## [10] jsonlite_1.8.3 evaluate_0.18 memoise_2.0.1
## [13] lifecycle_1.0.3 gtable_0.3.1 lattice_0.20-45
## [16] pkgconfig_2.0.3 rlang_1.0.6 Matrix_1.5-3
## [19] cli_3.4.1 yaml_2.3.6 pkgdown_2.0.6
## [22] xfun_0.34 fastmap_1.1.0 withr_2.5.0
## [25] stringr_1.4.1 knitr_1.40 vctrs_0.5.0
## [28] desc_1.4.2 generics_0.1.3 fs_1.5.2
## [31] sass_0.4.2 systemfonts_1.0.4 rprojroot_2.0.3
## [34] grid_4.2.2 glue_1.6.2 R6_2.5.1
## [37] textshaping_0.3.6 fansi_1.0.3 rmarkdown_2.18
## [40] farver_2.1.1 purrr_0.3.5 magrittr_2.0.3
## [43] scales_1.2.1 htmltools_0.5.3 matrixStats_0.62.0
## [46] matrixTests_0.1.9.1 colorspace_2.0-3 labeling_0.4.2
## [49] ragg_1.2.4 utf8_1.2.2 stringi_1.7.8
## [52] munsell_0.5.0 cachem_1.0.6