vignettes/post-hoc_fMRI.Rmd
post-hoc_fMRI.Rmd
This vignette shows how to use the sanssouci
package to perform permutation-based post hoc inference for fMRI data. We focus on a motor task (left versus right click) from the Localizer data set by Orfanos et al. (2017). The data set is included in the sanssouci.data
package. It has been obtained using the Python module nilearn, see Abraham et al. (2014).
We first make sure that the required R packages are installed and loaded.
require("ggplot2") || install.packages("ggplot2")
require("sanssouci") || remotes::install_github("sanssouci-org/sanssouci@develop")
require("sanssouci.data") || remotes::install_github("sanssouci-org/sanssouci.data")
We set the seed of the random number generator for numerical reproducibility of the results:
set.seed(20200924)
The samples are classified into two subgoups, These subgroups correspond to two different motor tasks the subjects were asked to perform, “left click” vs “right click”:
categ | Freq |
---|---|
left | 15 |
right | 15 |
We start by calculating the Simes bound introduced by Goeman and Solari (2011). This bound has already been applied to fMRI data, see Rosenblatt et al. (2018). In that paper, post hoc inference (by the Simes inequality) is referred to as “All-resolutions inference” (ARI).
alpha <- 0.1
We set the target confidence level to \(\alpha = 0.1\). The Simes bound is implemented in the package ‘cherry’. Here, we use the ‘sanssouci’ package, as follows:
groups <- ifelse(colnames(Y) == "left", 0, 1) # map to 0/1
obj <- SansSouci(Y = Y, groups = groups)
res_Simes <- fit(obj, B = 0, family = "Simes", alpha = alpha) ## B=0 => no calibration!
Post hoc bounds are obtained from the bound
function:
TP_Simes <- predict(res_Simes, what = "TP")
TP_Simes
#> [1] 88
The Simes bound implies that with probability larger than 0.9, the number of voxels associated to the task of interest (among all 48900 voxels) is at least 88.
Disregarding (for now) the voxel locations in the brain, we plot confidence bounds on \(p\)-value level sets (corresponding to maps obtained by thresholding at a given test statistic):
As discussed in Blanchard, Neuvial, and Roquain (2020), the above-described bound is known to be valid only under certain positive dependence assumptions (PRDS) on the joint \(p\)-value distribution. Although the PRDS assumption is widely accepted for fMRI studies (see Genovese, Lazar, and Nichols (2002), Nichols and Hayasaka (2003)), we argue (and demonstrate below) that this assumption yields overly conservative post hoc bounds. Indeed, the Simes bound is by construction not adaptive to the specific type of dependence at hand for a particular data set.
To bypass these limitations, Blanchard, Neuvial, and Roquain (2020) have proposed a randomization-based procedure known as \(\lambda\)-calibration, which yields tighter bounds that are adapted to the dependency observed in the data set at hand. We note that a related approach has been proposed by Hemerik, Solari, and Goeman (2019). In the case of two-sample tests, this calibration can be achieved by permutation of class labels, which is readily available via the fit
function of the sanssouci package:
B <- 200
res <- fit(obj, alpha = alpha, B = B, family = "Simes")
An alternative to the Simes/Linear reference family is the Beta reference family:
K <- 500
res_Beta <- fit(res, alpha = alpha, B = B, family = "Beta", K = K)
As expected from the theory, the post hoc bounds obtained after calibration by these methods is much tighter than the Simes bound:
resList <- list("Simes/ARI" = res_Simes,
"Linear" = res,
"Beta" = res_Beta)
names(resList)[3] <- sprintf("Beta (K=%s)", K)
bounds <- sapply(resList, predict, what = "TP")
knitr::kable(bounds, digits = 2, col.names = c("Lower bound on True Positives"))
Lower bound on True Positives | |
---|---|
Simes/ARI | 88 |
Linear | 321 |
Beta (K=500) | 480 |
In the next two sections we illustrate the use of these improved bounds in order to build
In the absence of prior information on voxels, a natural idea is to rank them by decreasing statistical significance, and a natural question to ask is:
Can we provide a lower confidence curve on the number (or proportion) of truly associated voxels among the most significant ones?
We illustrate the use of post-hoc methods to provide this type of information. More specifcally, we build confidence statements on the number of true/false positives within the top \(k\) most significant voxels, where \(k\) may be defined by the user after seing the data, and multiple choices of \(k\) are allowed.
The confidence curves obtained by calibration of the Simes and Beta families can be compared graphically to the (parametric) Simes curve that can be obtained from Goeman and Solari (2011):
conf_bounds <- lapply(resList, predict, all = TRUE)
# cols <- c("lightgray", "black", "darkgray")
cols <- RColorBrewer::brewer.pal(length(conf_bounds), "Dark2")
p <- plotConfCurve(conf_bounds, xmax = 700, cols = cols)
p + geom_line(size=1.5)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
In this example, ARI is outperformed by permutation-based approaches, which are able to adapt to unknown dependency: for the same target risk level (\(\alpha = 0.1\)), both permutation-based bounds are much less conservative than the classical Simes/ARI bound.
The goal in this section is to calculate post hoc bound on user-defined brain regions. One definition of such regions is given by the Harvard-Oxford brain atlas, which is available from the sanssouci.data package.
data(brainAtlas, package = "sanssouci.data")
This atlas is made of 48 areas. For each of these areas, we calculate the three post hoc bounds obtained above. Note that by construction of post hoc bounds, these 48 bounds are valid simultaneously (in particular, no further multiple testing correction is required at this stage).
bounds <- apply(brainAtlas, 1, FUN = function(reg) {
S <- which(reg > 0)
c("Region size" = length(S), sapply(resList, predict, S, "TP"))
})
In this particular case, some signal is revealed by the post hoc bound for 3 areas for permutation-based approaches:
ww <- which(colSums(bounds[3:4, ])>0)
cap <- sprintf("Lower bounds on the number of true positives in %s brain regions.",
length(ww))
knitr::kable(t(bounds[, ww]), caption = cap)
Region size | Simes/ARI | Linear | Beta (K=500) | |
---|---|---|---|---|
Precentral Gyrus | 6640 | 79 | 242 | 298 |
Postcentral Gyrus | 5030 | 69 | 194 | 209 |
Supramarginal Gyrus, anterior division | 2017 | 0 | 3 | 0 |
Here again, ARI is outperformed by permutation-based approaches which are able to adapt to unknown dependency: for the same target risk level (\(\alpha = 0.1\)) both permutation-based bounds are substantially less conservative than the classical Simes/ARI bound.
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sanssouci_0.12.8 ggplot2_3.4.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 RColorBrewer_1.1-3 highr_0.9
#> [4] bslib_0.4.1 compiler_4.2.2 pillar_1.8.1
#> [7] jquerylib_0.1.4 tools_4.2.2 digest_0.6.30
#> [10] lattice_0.20-45 jsonlite_1.8.3 evaluate_0.18
#> [13] memoise_2.0.1 lifecycle_1.0.3 tibble_3.1.8
#> [16] gtable_0.3.1 pkgconfig_2.0.3 rlang_1.0.6
#> [19] Matrix_1.5-3 cli_3.4.1 yaml_2.3.6
#> [22] pkgdown_2.0.6 xfun_0.34 fastmap_1.1.0
#> [25] withr_2.5.0 stringr_1.4.1 knitr_1.40
#> [28] generics_0.1.3 desc_1.4.2 fs_1.5.2
#> [31] sass_0.4.2 vctrs_0.5.0 systemfonts_1.0.4
#> [34] rprojroot_2.0.3 grid_4.2.2 glue_1.6.2
#> [37] R6_2.5.1 textshaping_0.3.6 fansi_1.0.3
#> [40] rmarkdown_2.18 farver_2.1.1 purrr_0.3.5
#> [43] magrittr_2.0.3 codetools_0.2-18 matrixTests_0.1.9.1
#> [46] matrixStats_0.62.0 scales_1.2.1 htmltools_0.5.3
#> [49] colorspace_2.0-3 labeling_0.4.2 ragg_1.2.4
#> [52] utf8_1.2.2 stringi_1.7.8 munsell_0.5.0
#> [55] cachem_1.0.6
Abraham, Alexandre, Fabian Pedregosa, Michael Eickenberg, Philippe Gervais, Andreas Mueller, Jean Kossaifi, Alexandre Gramfort, Bertrand Thirion, and Gaël Varoquaux. 2014. “Machine Learning for Neuroimaging with Scikit-Learn.” Frontiers in Neuroinformatics 8: 14.
Blanchard, Gilles, Pierre Neuvial, and Etienne Roquain. 2020. “Post Hoc Confidence Bounds on False Positives Using Reference Families.” Annals of Statistics 48 (3): 1281–1303. https://projecteuclid.org/euclid.aos/1594972818.
Genovese, Christopher R, Nicole A Lazar, and Thomas Nichols. 2002. “Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate.” Neuroimage 15 (4): 870–78.
Goeman, Jelle J, and Aldo Solari. 2011. “Multiple Testing for Exploratory Research.” Statistical Science 26 (4): 584–97.
Hemerik, Jesse, Aldo Solari, and Jelle J Goeman. 2019. “Permutation-Based Simultaneous Confidence Bounds for the False Discovery Proportion.” Biometrika 106 (3): 635–49.
Nichols, Thomas, and Satoru Hayasaka. 2003. “Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review.” Statistical Methods in Medical Research 12 (5): 419–46.
Orfanos, Dimitri Papadopoulos, Vincent Michel, Yannick Schwartz, Philippe Pinel, Antonio Moreno, Denis Le Bihan, and Vincent Frouin. 2017. “The Brainomics/Localizer Database.” Neuroimage 144: 309–14.
Rosenblatt, Jonathan D, Livio Finos, Wouter D Weeda, Aldo Solari, and Jelle J Goeman. 2018. “All-Resolutions Inference for Brain Imaging.” Neuroimage 181: 786–96.