New Preprint: Statistical Evidence in Psychological Networks: A Bayesian Analysis of 294 Networks from 126 Studies

Psychometric networks have become a popular tool for multivariate data analysis in psychology and the social sciences. Researchers conceptualize a construct as a network of variables, interpreting the presence or absence of a network edge (ie, conditional independence) and the strength of the present edges (ie, the strength of the partial associations). However, the statistical evidence supporting the network findings is generally not evaluated, and therefore it is unknown how robust the results in the network literature are. Bayesian methods allow us to answer this question by estimating the uncertainty about the network edges and the edge weights. Here, we estimate the uncertainty in the network field by analyzing 294 psychometric networks from 126 published papers with the Bayesian approach. We found inconclusive evidence for the presence or absence of one-third of the edges, weak evidence for half, and compelling evidence for less than twenty percent of the edges. Thus, 80% of edges from the analyzed networks lack sufficient support from data to conclude their presence or absence with confidence. Networks estimated on a high relative sample size, with more than 70 observations per possible edge, had sufficient evidence to conclude the presence or absence of more than half of its edges. Our study shows that networks are often supported by too little evidence from the data for results to be reported with confidence, not meaning that results are flawed but rather that they cannot provide a solid basis for cumulative science. All results are available in an accompanying open-access website ReBayesed allowing researchers to explore the reanalyzed networks and determine findings that are robust across studies. For details have a look at preprint. The reproducibility archive is available here.

New Paper: Towards a Generative Model of Emotion Dynamics

Most theories of emotion suggest that emotions are reactions to situations we encounter in daily life. Process theories of emotion further specify a feedback loop between our environment, attention, emotions, and action that clarifies the adaptive nature of emotions. In principle, such process theories describe how emotions develop in daily life, and consequently, emotion measures collected from individuals many times a day in studies using the experience sampling methodology should be highly useful in advancing these theories. However, current emotion theories are predominantly verbal theories and therefore do not make clear predictions about such data. In this article, we take a first step toward a generative model of emotion dynamics by formalizing the link between situations and emotions, which provides us with a basic generative model of emotions in daily life. We show that this incomplete model already reproduces nine empirical phenomena in emotion time series related to (temporal) statistical associations between emotions and their distributional form. We then discuss how we can draw on existing (process) theories of emotion to extend our basic model into a complete generative model of emotion dynamics. Finally, we discuss how generative models of emotion dynamics can facilitate theory development and advance measurement, study design, and statistical analysis. For details have a look at the open access publication in Psychological Review. The model and the reproducibility archives are available here and here.

Should you Submit Papers before Christmas? Submission Percentages across Months of the Year

The argument sounds reasonable enough: Everyone is trying to wrap up projects before the end of the year, so the number of submissions in December is significantly higher than in earlier months. Assuming that the number of papers sent out for review remains constant across months (which seems reasonable, since resources such as editors and reviewers do not increase in December—indeed, the opposite), this would imply that the desk rejection rate increases in December. And consequently, all else being equal, one should avoid submitting a paper in December. To my surprise, a simple web search was not sufficient to check the premise that more papers are submitted in December. Helpfully,, a preprint server popular in physics, mathematics, computer science, and quantitative biology, provides monthly submission statistics since 1993.

New Preprint: A Bayesian Independent Samples T Test for Parameter Differences in Networks of Binary and Ordinal Variables

Multivariate analysis of psychological variables using graphical models has become a standard analysis in the psychometric literature. Most cross-sectional measures are either binary or ordinal, and the methodology for inferring the structure of networks of binary and ordinal variables is developing rapidly. In practice, however, research questions often focus on whether and how networks differ between observed groups. While Bayes factor methods for inferring network structure are well established, a similar methodology for assessing group differences in networks of binary or ordinal variables is currently lacking. In this paper, we extend the Bayesian framework for the analysis of ordinal Markov random fields, a network model for binary and ordinal variables, and develop Bayes factor tests for assessing parameter differences in the networks of two independent groups. The proposed methods are implemented in the R package bgms, and we use numerical illustrations to show that the implemented methods work correctly and how well the methods work compared to existing methods in situations resembling empirical research. Have a look at the preprint. The reproducibility archive is available here. The bgms package is on CRAN.

New Paper: Climate Actions by Climate and Non-Climate Researchers

Tackling climate change requires both systemic changes and individual lifestyle changes. Are those best placed to understand the risks and solutions to climate change acting on their knowledge? In a large-scale study of $N=9220$ researchers across $115$ countries, we found that climate researchers reported engaging in considerably more advocacy and activism on climate change and, to a lesser extent, high-impact lifestyle changes than non-climate researchers. For details have a look at the open access publication in NPJ Climate Action.

Selecting the Number of Factors in Exploratory Factor Analysis via out-of-sample Prediction Errors

Exploratory Factor Analysis (EFA) identifies a number of latent factors that explain correlations between observed variables. A key issue in the application of EFA is the selection of an adequate number of factors. This is a non-trivial problem because more factors always improve the fit of the model. Most methods for selecting the number of factors fall into two categories: either they analyze the patterns of eigenvalues of the correlation matrix, such as parallel analysis; or they frame the selection of the number of factors as a model selection problem and use approaches such as likelihood ratio tests or information criteria.

The Impact of Ordinal Scales on Gaussian Mixture Recovery

Gaussian Mixture Models (GMMs) and its special cases Latent Profile Analysis and k-Means are a popular and versatile tools for exploring heterogeneity in multivariate continuous data. However, they assume that the observed data are continuous, an assumption that is often not met: for example, the severity of symptoms of diseases is often measured in ordinal categories such as not at all, several days, more than half the days, and nearly every day, and survey questions are often assessed using ordinal responses such as strongly agree, agree, neutral, and agree, strongly agree. In this blog post, I summarize a paper which investigates to what extent estimating GMMs is robust against observing ordinal instead of continuous variables.

Computing Odds Ratios from Mixed Graphical Models

Interpreting statistical network models typically involves interpreting individual edge parameters. If the network model is a Gaussian Graphical Model (GGM), the interpretation is relatively simple: the pairwise interaction parameters are partial correlations, which indicate conditional linear relationships and vary from -1 to 1. Using the standard deviations of the two involved variables, the partial correlation can also be transformed into a linear regression coefficient (see for example here). However, when studying interactions involving categorical variables, such as in an Ising model or a Mixed Graphical Model (MGM), the parameters are not limited to a certain range and their interpretation is less intuitive. In these situations it may be helpful to report the interactions between variables in terms of odds ratios.

Estimating Group Differences in Network Models using Moderation

Researchers are often interested in comparing statistical network models across groups. For example, Fritz and colleagues compared the relations between resilience factors in a network model for adolescents who did experience childhood adversity to those who did not. Several methods are already available to perform such comparisons. The Network Comparison Test (NCT) performs a permutation test to decide for each parameter whether it differs across two groups. The Fused Graphical Lasso (FGL) uses a lasso penalty to estimate group differences in Gaussian Graphical Models (GGMs). And the BGGM package allows one to test and estimate differences in GGMs in a Bayesian setting. In a recent preprint, I proposed an additional method based on moderation analysis which has the advantage that it can be applied to essentially any network model and at the same time allows for comparisons across more than two groups.

Estimating Time-varying Vector Autoregressive (VAR) Models

Models for individual subjects are becoming increasingly popular in psychological research. One reason is that it is difficult to make inferences from between-person data to within-person processes. Another is that time series obtained from individuals are becoming increasingly available due to the ubiquity of mobile devices. The central goal of so-called idiographic modeling is to tap into the within-person dynamics underlying psychological phenomena. With this goal in mind many researchers have set out to analyze the multivariate dependencies in within-person time series. The most simple and most popular model for such dependencies is the first-order Vector Autoregressive (VAR) model, in which each variable at the current time point is predicted by (a linear function of) all variables (including itself) at the previous time point.

Moderated Network Models for Continuous Data

Statistical network models have become a popular exploratory data analysis tool in psychology and related disciplines that allow to study relations between variables. The most popular models in this emerging literature are the binary-valued Ising model and the multivariate Gaussian distribution for continuous variables, which both model interactions between pairs of variables. In these pairwise models, the interaction between any pair of variables A and B is a constant and therefore does not depend on the values of any of the variables in the model. Put differently, none of the pairwise interactions is moderated. However, in the highly complex and contextualized fields like psychology, such moderation effects are often plausible. In this blog post, I show how to fit, analyze, visualize and assess the stability of Moderated Network Models for continuous data with the R-package mgm.

Regression with Interaction Terms - How Centering Predictors influences Main Effects'

Centering predictors in a regression model with only main effects has no influence on the main effects. In contrast, in a regression model including interaction terms centering predictors does have an influence on the main effects. After getting confused by this, I read this nice paper by Afshartous & Preston (2011) on the topic and played around with the examples in R. I summarize the resulting notes and code snippets in this blogpost.

Deconstructing 'Measurement error and the replication crisis'

Yesterday, I read ‘Measurement error and the replication crisis’ by Eric Loken and Andrew Gelman, which left me puzzled. The first part of the paper consists of general statements about measurement error. The second part consists of the claim that in the presence of measurement error, we overestimate the true effect when having a small sample size. This sounded wrong enough to ask the authors for their simulation code and spend a couple of hours to figure out what they did in their paper. I am offering a short and a long version.

Predictability in Network Models

Network models have become a popular way to abstract complex systems and gain insights into relational patterns among observed variables in many areas of science. The majority of these applications focuses on analyzing the structure of the network. However, if the network is not directly observed (Alice and Bob are friends) but estimated from data (there is a relation between smoking and cancer), we can analyze - in addition to the network structure - the predictability of the nodes in the network. That is, we would like to know: how well can a given node in the network predicted by all remaining nodes in the network?

Interactions between Categorical Variables in Mixed Graphical Models

In a previous post we estimated a Mixed Graphical Model (MGM) on a dataset of mixed variables describing different aspects of the life of individuals diagnosed with Autism Spectrum Disorder, using the mgm package. For interactions between continuous variables, the weighted adjacency matrix fully describes the underlying interaction parameter. Correspondinly, the parameters are fully represented in the graph visualization: the width of the edges is proportional to the absolute value of the parameter, and the edge color indicates the sign of the parameter. This means that we can clearly interpret an edge between two continuous variables as a positive or negative linear relationship of some strength.

Estimating Mixed Graphical Models

Determining conditional independence relationships through undirected graphical models is a key component in the statistical analysis of complex obervational data in a wide variety of disciplines. In many situations one seeks to estimate the underlying graphical model of a dataset that includes variables of different domains.

