Conservativeness of the Simes inequality under positive dependence
P. Neuvial
2024-06-24
Source:vignettes/Simes_equi-correlation.Rmd
Simes_equi-correlation.Rmd
The Simes inequality
Let \(\mathbf{q}=(q_i)_{1 \leq i \leq m}\) be a vector of random variables such that:
- for each \(i=1 \ldots m\), \(q_i \sim \mathcal{U}[0,1]\)
- \(\mathbf{q}\) satisfies positive regression dependency on a subset (PRDS)
We refer to Sarkar (1998) for a formal definition of this form of positive dependence. Here, we simply note that it holds in particular of Gaussian equi-correlated random variables.
Then, letting \((q_{(1)}, \ldots q_{(m)} )\) be the corresponding order statistics, we have:
\[ \mathbb{P} \left( \exists i \in \{1, \ldots m\} \::\: q_{(i)} \leq \frac{\alpha i}{m}\right) \leq \alpha \] This inequality is due to Simes (1986). It is sharp under independence of the \(q_i\) (that is, the above inequality is an equality). How sharp is it under positive dependence?
Estimating the size of the Simes test
We write a small function to estimate the size of the Simes test, that is, the level actually achieved by the left-hand side in the above inequality.
size <- function(rho, m, B, alpha=0.05) {
X0 <- gaussianTestStatistics(m, B, dep = "equi", param = rho)$X0
p0 <- pnorm(X0, lower.tail = FALSE)
thr <- t_linear(alpha, 1:m, m)
p0s <- sanssouci:::colSort(p0);
isAbove <- sweep(p0s, 1, thr, "<")
nAbove <- colSums(isAbove)
mean(nAbove > 0)
}
Our simulation parameters are set as follows:
m <- 1e3
rhos <- c(0, 0.1, 0.2, 0.4, 0.8)
alpha <- 0.2
B <- 1000
That is:
- \(1000\) hypotheses
- equi-correlation level between \(0\) and \(0.8\)
- target level of the Simes test: \(0.2\)
- \(1000\) replications are used to estimate the size of the test
We estimate the size of the test as the average size \(\hat{\alpha}\) achieved across replications, and the corresponding standard error as \(\sqrt{\hat{\alpha}(1-\hat{\alpha})/B}\).
tb <- rbind(ahat/alpha, ses/alpha)
colnames(tb) <- rhos
rownames(tb) <- c("Achieved level/target level", "Standard error")
knitr::kable(tb, digits = 2)
0 | 0.1 | 0.2 | 0.4 | 0.8 | |
---|---|---|---|---|---|
Achieved level/target level | 1.06 | 0.89 | 0.78 | 0.42 | 0.39 |
Standard error | 0.06 | 0.06 | 0.06 | 0.04 | 0.04 |
This table illustrates the sharpness of the Simes test under independence (\(\rho=0\)), and the conservativeness of this test under positive dependence (\(\rho>0\)). For example, when \(\rho=0.4\), the size of the Simes test is less than half the target level.
This conservativeness is one of the motivations of the development of post hoc inference methods that use randomization to adapt to dependence. See the following vignettes:
Session information
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.47 matrixStats_1.3.0 sanssouci_0.13.0
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.6.5 cli_3.6.3 rlang_1.1.4 xfun_0.45
## [5] purrr_1.0.2 generics_0.1.3 textshaping_0.4.0 jsonlite_1.8.8
## [9] htmltools_0.5.8.1 ragg_1.3.2 sass_0.4.9 rmarkdown_2.27
## [13] grid_4.4.1 evaluate_0.24.0 jquerylib_0.1.4 fastmap_1.2.0
## [17] matrixTests_0.2.3 yaml_2.3.8 lifecycle_1.0.4 memoise_2.0.1
## [21] compiler_4.4.1 fs_1.6.4 Rcpp_1.0.12 lattice_0.22-6
## [25] systemfonts_1.1.0 digest_0.6.35 R6_2.5.1 magrittr_2.0.3
## [29] Matrix_1.7-0 bslib_0.7.0 tools_4.4.1 pkgdown_2.0.9
## [33] cachem_1.1.0 desc_1.4.3