Confidence curves for structured hypotheses

Introduction

The goal of this vignette is to illustrate the post hoc bounds on the number of true/false positives proposed in Durand et al. (2020) for localized signals. More specifically, we reproduce one of the plots of Figure 12 in Durand et al. (2020) using the R package sanssouci. We will explicitly quote Durand et al. (2020) when relevant.

library("sanssouci")

Objective

We consider $m$ null ordered hypotheses partitioned in intervals of size $s$ . For simplicity we set the number of intervals to be a power of 2: $m = s 2^q$ for some integer $q$ :

s <- 100
q <- 7
m <- s*2^q

Therefore, we have $m = 1.28\times 10^{4}$ . Our goal is to compare three post hoc bounds. These bounds are obtained by interpolation from a reference family where the amount of signal is estimated by probabilistic inequalities, following the general principle laid down by Blanchard, Neuvial, and Roquain (2020), and they differ by the choice of the reference family:

“Simes”: the bound derived from the Simes (1986) inequality as proposed by Goeman and Solari (2011) and further studied by Blanchard, Neuvial, and Roquain (2020). This bound was introduced in a context where the signal is not localized.
“Tree” and “Partition”: two bounds derived from the DKWM inequality (Dvoretzky, Kiefer, and Wolfowitz (1956), Massart (1990)), as proposed in Durand et al. (2020). For the “Partition” bound, the reference family is the original partition $(P_k)_k$ of the $m$ null hypotheses into $K=2^q$ intervals. For the “Tree” bound, the reference family is the perfect binary tree whose leaves are the elements of the original partition.

Settings

We define the following numerical parameters, which characterize the true/false null hypothesis configuration considered in Section 5 of Durand et al. (2020):

K1 <- 8
r <- 0.9
m1 <-  r*K1*s
barmu <- 3

More precisely, quoting Durand et al. (2020):

the false null hypotheses are contained in $P_k$ for $1 \leq k \leq K_1$ , for some fixed value of $K_1$ . The quantity $r$ is defined similarly as in (20), as the fraction of false null hypotheses in those $P_k$ , and is set to $r =0.9$ . All of the other partition pieces only contain true null hypotheses.

We start by creating a binary tree structure and generating the signal:

family <- dyadic.from.window.size(m, s = s, method = 2)
mu <- gen.mu.leaves(m = m, K1 = K1, d = r, grouped = TRUE, setting = "const", 
                    barmu = barmu, leaf_list = family$leaf_list)

This construction is illustrated in the figure below (Figure 11 in Durand et al. (2020)) in the case where $q=3$ .

Figure 11 in Durand et al.

We generate $p$ -values according to the simulation setup in Durand et al. (2020):

The true null $p$ -values are distributed as i.i.d. $\mathcal{N}(0,1)$ , and false null $p$ -values are distributed as i.i.d. $\mathcal{N}(\bar{\mu}, 1)$ , where [ $\bar{\mu}= 3$ ].

pvalues <- gen.p.values(m = m, mu = mu, rho = 0, alternative = "greater")

Calculate confidence curves

The confidence level for post hoc inference is set to $\alpha = 0.05$ .

alpha <- 0.05

Below, we will be considering confidence curves of the form $(k, V(S_k))_{1 \leq k \leq m}$ , where $S_k$ is the set of the $k$ smallest $p$ -values (regardless of the ordering given by the partition). Note that focusing on such sets is a priori favorable to the Simes bound, for which the elements of the reference family are among the $S_k$ .

ord <- order(pvalues)
res <- list()
x <- 1:m
stat <- rep("FP", m)

1- True number of false positives

The true number of false positives will be called “Oracle” bound in the plots below.

label <- rep("Oracle", m)
method <- rep("Oracle", m)
oracle_bound <- cumsum(mu[ord] == 0)
res[["Oracle"]] <- data.frame(x = x, label = label, stat = stat, bound = oracle_bound, method = method)

2- Simes-based confidence curve

Here we use the Simes (1986) inequality to bound the number of false positives in each node of the tree, as proposed by Goeman and Solari (2011) and further studied by Blanchard, Neuvial, and Roquain (2020), both in a context where the signal is not localized.

simes_bound <- curveMaxFP(pvalues, alpha * 1:m / m)
label <- rep("Simes", m)
method <- rep("Simes", m)
res[["Simes"]] <- data.frame(x = x, label = label, stat = stat, bound = simes_bound, method = method)

3- DKWM-based confidence curve

Here we use the DKWM inequality (Dvoretzky, Kiefer, and Wolfowitz (1956), Massart (1990)) to bound the number of false positives in each node of the tree, as suggested in Durand et al. (2020).

label <- rep("DKWM(tree)", m)
method <- rep("Tree", m)
C_tree <- family$C[8]
ZL_tree <- zetas.tree(C_tree, family$leaf_list, zeta.DKWM, pvalues, alpha)
tree_bound <- curve.V.star.forest.fast(ord, C_tree, ZL_tree, family$leaf_list, pruning = TRUE)
res[["Tree"]] <- data.frame(x = x, label = label, stat = stat, bound = tree_bound, method = method)

label <- rep("DKWM(partition)", m)
method <- rep("Partition", m)
ZL_part <- zetas.tree(family$C, family$leaf_list, zeta.DKWM, pvalues, alpha)
part_bound <- curve.V.star.forest.fast(ord, family$C, ZL_part, family$leaf_list, pruning = TRUE)
res[["Partition"]] <- data.frame(x = x, label = label, stat = stat, bound = part_bound, method = method)

Plot confidence curves

library("ggplot2")
dat <- Reduce(rbind, res)
lvls <- c("Oracle", "Partition", "Simes", "Tree", "Hybrid")
cols <- RColorBrewer::brewer.pal(length(lvls), "Set1")
names(cols) <- lvls

Upper bound on the number of false positives

xymax <- m1;
pV <- ggplot(dat, aes(x, bound, colour = method)) + 
    geom_line() +
    ylab("Upper bound on the number of false positives") +
    xlab("sorted hypotheses") +
    scale_colour_manual(values = cols)
pV

The “Tree” and “Partition” bounds are sharper than the “Simes” bound as soon as we are considering “large” sets of hypotheses. The fact that the “Tree” and “Partition” bounds are not as sharp as the “Simes” bound for the first hundred of hypotheses can be explained by our choice of the ordering of the null hypotheses in the sets $S_k$ , which as discussed above is favorable to the “Simes” bound.

Zooming on the first 720 null hypotheses (in the order of the $p$ -values):

pV + coord_cartesian(xlim = c(1, xymax),
                     ylim = c(0, xymax))

Lower bound on the number of true positives

The same information can be displayed as a lower bound on the number of true positives, defined for any $S \subset \{1 \dots m\}$ by $|S| - V(S)$ :

dat$S <- dat$x - dat$bound

xmax <- m1;
ymax <- max(dat$S[m1]);
pS <- ggplot(dat, aes(x, S, colour = method)) + 
    geom_line() +
    ylab("Lower bound on the number of true positives") +
    xlab("sorted hypotheses") +
    scale_colour_manual(values = cols)
pS

Zooming in the first 720 null hypotheses (in the order of the $p$ -values), we recover the middle plot in Figure 12 of Durand et al. (2020).

pS + xlim(1, xmax) + ylim(0, ymax)

Session information

sessionInfo()

## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.5.2    sanssouci_0.16.0
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5        cli_3.6.5          knitr_1.50         rlang_1.1.6       
##  [5] xfun_0.52          generics_0.1.4     textshaping_1.0.1  jsonlite_2.0.0    
##  [9] labeling_0.4.3     glue_1.8.0         htmltools_0.5.8.1  ragg_1.4.0        
## [13] sass_0.4.10        scales_1.4.0       rmarkdown_2.29     grid_4.5.1        
## [17] tibble_3.3.0       evaluate_1.0.4     jquerylib_0.1.4    fastmap_1.2.0     
## [21] matrixTests_0.2.3  yaml_2.3.10        lifecycle_1.0.4    compiler_4.5.1    
## [25] RColorBrewer_1.1-3 fs_1.6.6           pkgconfig_2.0.3    farver_2.1.2      
## [29] systemfonts_1.2.3  lattice_0.22-7     digest_0.6.37      R6_2.6.1          
## [33] pillar_1.11.0      magrittr_2.0.3     bslib_0.9.0        Matrix_1.7-3      
## [37] withr_3.0.2        gtable_0.3.6       tools_4.5.1        matrixStats_1.5.0 
## [41] pkgdown_2.1.3      cachem_1.1.0       desc_1.4.3

References

Blanchard, Gilles, Pierre Neuvial, and Etienne Roquain. 2020. “Post Hoc Confidence Bounds on False Positives Using Reference Families.” Annals of Statistics 48 (3): 1281–1303. https://projecteuclid.org/euclid.aos/1594972818.

Durand, Guillermo, Gilles Blanchard, Pierre Neuvial, and Etienne Roquain. 2020. “Post hoc false positive control for structured hypotheses.” Scandinavian Journal of Statistics. https://hal.archives-ouvertes.fr/hal-01829037.

Dvoretzky, Aryeh, Jack Kiefer, and Jacob Wolfowitz. 1956. “Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator.” The Annals of Mathematical Statistics 27 (3): 642–69.

Goeman, Jelle J, and Aldo Solari. 2011. “Multiple Testing for Exploratory Research.” Statistical Science 26 (4): 584–97.

Massart, Pascal. 1990. “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality.” The Annals of Probability 18 (3): 1269–83.

Simes, R J. 1986. “An improved Bonferroni procedure for multiple tests of significance.” Biometrika 73 (3): 751–54.

G. Durand, G. Blanchard, P. Neuvial, E. Roquain

2025-07-25