New Preprint: A Stochastic Block Prior for Clustering in Graphical Models

Graphical models facilitate the representation of psychological variables as complex systems of interacting variables structured as a network. However, their existing statistical analyses overlook the assumption of clustering—the grouping of subsets of variables that are more densely connected within the network—despite its central role in many psychological theories. We address this gap by proposing the use of the Stochastic Block Model (SBM) as a prior distribution on the network structure of a graphical model for binary and ordinal data. The SBM assumes that variables belong to latent clusters, where the probability of an edge depends on the cluster membership of the nodes. Embedding this prior in a Bayesian graphical modeling framework allows researchers to formally incorporate theoretical expectations about clustering, test hypotheses about the number of clusters, and estimate cluster assignments from cross-sectional data. We demonstrate the benefits of this approach in a simulation study and reanalyze 30 openly available empirical datasets to test for clustering. This work highlights how the Bayesian framework can embed theoretical assumptions into network models via priors and introduces a new tool for latent cluster inference in psychological network analysis. [Preprint]

Read More

New Preprint: Enhancing Scale Development: Pseudo Factor Analysis of Language Embedding Similarity Matrices

We build on recent work using Large Language Models (LLMs) in psychometrics to generate pseudo-discrimination parameters. While earlier work focused on pseudo-discrimination at the item-by-construct level, we introduce Pseudo-Factor Analysis to support scale design. It is a data-free, model-based approach to evaluating key aspects of a latent construct’s measurement model, such as dimensionality and relations between factors and indicators. In two studies using Five- and Six-factor personality frameworks, various sentence transformer models, and three encoding methods (atomic, atomic reversed, and macro), pseudo-factor analyses recovered theoretically expected structures. These structures aligned closely with empirical factor structures based on human rating data from prior research. We propose Pseudo-Factor Analysis as a useful method for evaluating and refining items after generation and before trialing. A Shiny app is provided to compute pseudo-factor parameters and related psychometric estimates. [Preprint]

Read More

New Preprint: Modeling Qualitative Between-Person Heterogeneity in Time-Series using Latent Class Vector Autoregressive Models

Time-series data have become ubiquitous in psychological research because it allows us to study within-person dynamics and their heterogeneity across persons. Vector autoregressive (VAR) models have become a popular choice as a first approximation of within-person dynamics. The VAR model of each person and the heterogeneity across persons can be jointly modeled using a hierarchical model that captures heterogeneity as a latent distribution. Currently, the most popular choice for this is the multilevel VAR model, which models heterogeneity across persons as quantitative variation through a multivariate Gaussian distribution. Here, we discuss an alternative, the latent class VAR model, which models heterogeneity as qualitative variation using a number of discrete clusters. While this model has been introduced before, it has not been readily accessible to researchers. We change this with this paper, in which we provide an accessible introduction to latent class VAR models; evaluate, in a simulation study, how well this model can be estimated in situations resembling applied research; introduce a new R package ClusterVAR, which provides easy-to-use functions to estimate the model; and provide a fully reproducible tutorial on modeling emotion dynamics, which walks the reader through all steps of estimating, analyzing, and interpreting latent class VAR models. [Preprint]

Read More

New Preprint: Statistical Evidence in Psychological Networks: A Bayesian Analysis of 294 Networks from 126 Studies

Psychometric networks are widely used to analyze multivariate data in psychology and the social sciences. Researchers interpret constructs as networks of variables, focusing on the presence (or absence) and strength of edges—i.e., conditional independencies and partial associations. However, the statistical support for these findings is rarely evaluated, leaving their robustness unclear. Bayesian methods can address this by estimating uncertainty about edges and their weights. We applied this approach to 294 networks from 126 published papers. Results showed inconclusive evidence for one-third of edges, weak evidence for half, and strong evidence for fewer than 20%. Overall, 80% of edges lacked sufficient support to confidently conclude presence or absence. Networks with high relative sample sizes (over 70 observations per edge) were more robust, supporting over half of their edges. These findings suggest that many reported networks rest on limited evidence - this does not mean that results are flawed but rather that they alone do not support strong conclusions. An open-access website, ReBayesed, allows researchers to explore all results and identify robust findings. For details have a look at preprint. The reproducibility archive is available here.

Read More

New Paper: Towards a Generative Model of Emotion Dynamics

Most emotion theories view emotions as reactions to situations in daily life. Process theories go further, proposing a feedback loop between environment, attention, emotion, and action that explains emotions’ adaptive role. Experience sampling data, which captures emotions in real time, should be ideal for testing such theories. However, existing emotion theories are largely verbal and lack precise predictions for these data. Here, we take a first step toward a generative model of emotion dynamics by formalizing the link between situations and emotions. This basic model already reproduces nine empirical phenomena in emotion time series, including temporal associations and distributional patterns. We then show how process theories can inform extensions of this model into a fuller generative account. Finally, we discuss how such models can support theory development, improve measurement, and guide study design and analysis. See the open access Psychological Review paper for details. The model and reproducibility materials are available here and here. For details have a look at the open access publication in Psychological Review. The model and the reproducibility archives are available here and here.

Read More

Should you Submit Papers before Christmas? Submission Distribution Across the Year

The argument sounds reasonable enough: Everyone is trying to wrap up projects before the end of the year, so the number of submissions in December is significantly higher than in earlier months. Assuming that the number of papers sent out for review remains constant across months (which seems reasonable, since resources such as editors and reviewers do not increase in December—indeed, the opposite), this would imply that the desk rejection rate increases in December. And consequently, all else being equal, one should avoid submitting a paper in December. To my surprise, a simple web search was not sufficient to check the premise that more papers are submitted in December. Helpfully, arxiv.org, a preprint server popular in physics, mathematics, computer science, and quantitative biology, provides monthly submission statistics since 1993.

Read More

New Preprint: A Bayesian Test for Group Differences in Networks of Binary and Ordinal Variables

Multivariate analysis of psychological variables using graphical models has become a standard analysis in the psychometrics. Most cross-sectional measures are either binary or ordinal, and methods for inferring the structure of networks of binary and ordinal variables is developing rapidly. Research questions often focus on whether and how networks differ between observed groups. While Bayes factor methods for inferring network structure are well established, a similar methodology for assessing group differences in networks of binary or ordinal variables is currently lacking. In this paper, we extend the Bayesian framework for the analysis of ordinal Markov random fields, a network model for binary and ordinal variables, and develop Bayes factor tests for assessing parameter differences in the networks of two independent groups. The proposed methods are implemented in the R package bgms, and we use numerical illustrations to show that the implemented methods work correctly and how well the methods work compared to existing methods in situations resembling empirical research. Have a look at the preprint. The reproducibility archive is available here. The bgms package is on CRAN.

Read More

New Paper: Climate Actions by Climate and Non-Climate Researchers

Tackling climate change requires both systemic changes and individual lifestyle changes. Are those best placed to understand the risks and solutions to climate change acting on their knowledge? In a large-scale study of $N=9220$ researchers across $115$ countries, we found that climate researchers reported engaging in considerably more advocacy and activism on climate change and, to a lesser extent, high-impact lifestyle changes than non-climate researchers. For details have a look at the open access publication in NPJ Climate Action.

Read More

Selecting the Number of Factors in Exploratory Factor Analysis via out-of-sample Prediction Errors

Exploratory Factor Analysis (EFA) identifies a number of latent factors that explain correlations between observed variables. A key issue in the application of EFA is the selection of an adequate number of factors. This is a non-trivial problem because more factors always improve the fit of the model. Most methods for selecting the number of factors fall into two categories: either they analyze the patterns of eigenvalues of the correlation matrix, such as parallel analysis; or they frame the selection of the number of factors as a model selection problem and use approaches such as likelihood ratio tests or information criteria.

Read More

The Impact of Ordinal Scales on Gaussian Mixture Recovery

Gaussian Mixture Models (GMMs) and its special cases Latent Profile Analysis and k-Means are a popular and versatile tools for exploring heterogeneity in multivariate continuous data. However, they assume that the observed data are continuous, an assumption that is often not met: for example, the severity of symptoms of diseases is often measured in ordinal categories such as not at all, several days, more than half the days, and nearly every day, and survey questions are often assessed using ordinal responses such as strongly agree, agree, neutral, and agree, strongly agree. In this blog post, I summarize a paper which investigates to what extent estimating GMMs is robust against observing ordinal instead of continuous variables.

Read More

Computing Odds Ratios from Mixed Graphical Models

Interpreting statistical network models typically involves interpreting individual edge parameters. If the network model is a Gaussian Graphical Model (GGM), the interpretation is relatively simple: the pairwise interaction parameters are partial correlations, which indicate conditional linear relationships and vary from -1 to 1. Using the standard deviations of the two involved variables, the partial correlation can also be transformed into a linear regression coefficient (see for example here). However, when studying interactions involving categorical variables, such as in an Ising model or a Mixed Graphical Model (MGM), the parameters are not limited to a certain range and their interpretation is less intuitive. In these situations it may be helpful to report the interactions between variables in terms of odds ratios.

Read More

Estimating Group Differences in Network Models using Moderation

Researchers are often interested in comparing statistical network models across groups. For example, Fritz and colleagues compared the relations between resilience factors in a network model for adolescents who did experience childhood adversity to those who did not. Several methods are already available to perform such comparisons. The Network Comparison Test (NCT) performs a permutation test to decide for each parameter whether it differs across two groups. The Fused Graphical Lasso (FGL) uses a lasso penalty to estimate group differences in Gaussian Graphical Models (GGMs). And the BGGM package allows one to test and estimate differences in GGMs in a Bayesian setting. In a recent preprint, I proposed an additional method based on moderation analysis which has the advantage that it can be applied to essentially any network model and at the same time allows for comparisons across more than two groups.

Read More

Estimating Time-varying Vector Autoregressive (VAR) Models

Models for individual subjects are becoming increasingly popular in psychological research. One reason is that it is difficult to make inferences from between-person data to within-person processes. Another is that time series obtained from individuals are becoming increasingly available due to the ubiquity of mobile devices. The central goal of so-called idiographic modeling is to tap into the within-person dynamics underlying psychological phenomena. With this goal in mind many researchers have set out to analyze the multivariate dependencies in within-person time series. The most simple and most popular model for such dependencies is the first-order Vector Autoregressive (VAR) model, in which each variable at the current time point is predicted by (a linear function of) all variables (including itself) at the previous time point.

Read More

Moderated Network Models for Continuous Data

Statistical network models have become a popular exploratory data analysis tool in psychology and related disciplines that allow to study relations between variables. The most popular models in this emerging literature are the binary-valued Ising model and the multivariate Gaussian distribution for continuous variables, which both model interactions between pairs of variables. In these pairwise models, the interaction between any pair of variables A and B is a constant and therefore does not depend on the values of any of the variables in the model. Put differently, none of the pairwise interactions is moderated. However, in the highly complex and contextualized fields like psychology, such moderation effects are often plausible. In this blog post, I show how to fit, analyze, visualize and assess the stability of Moderated Network Models for continuous data with the R-package mgm.

Read More

Regression with Interaction Terms - How Centering Predictors influences Main Effects'

Centering predictors in a regression model with only main effects has no influence on the main effects. In contrast, in a regression model including interaction terms centering predictors does have an influence on the main effects. After getting confused by this, I read this nice paper by Afshartous & Preston (2011) on the topic and played around with the examples in R. I summarize the resulting notes and code snippets in this blogpost.

Read More

Deconstructing 'Measurement error and the replication crisis'

Yesterday, I read ‘Measurement error and the replication crisis’ by Eric Loken and Andrew Gelman, which left me puzzled. The first part of the paper consists of general statements about measurement error. The second part consists of the claim that in the presence of measurement error, we overestimate the true effect when having a small sample size. This sounded wrong enough to ask the authors for their simulation code and spend a couple of hours to figure out what they did in their paper. I am offering a short and a long version.

Read More

Predictability in Network Models

Network models have become a popular way to abstract complex systems and gain insights into relational patterns among observed variables in many areas of science. The majority of these applications focuses on analyzing the structure of the network. However, if the network is not directly observed (Alice and Bob are friends) but estimated from data (there is a relation between smoking and cancer), we can analyze - in addition to the network structure - the predictability of the nodes in the network. That is, we would like to know: how well can a given node in the network predicted by all remaining nodes in the network?

Read More

Interactions between Categorical Variables in Mixed Graphical Models

In a previous post we estimated a Mixed Graphical Model (MGM) on a dataset of mixed variables describing different aspects of the life of individuals diagnosed with Autism Spectrum Disorder, using the mgm package. For interactions between continuous variables, the weighted adjacency matrix fully describes the underlying interaction parameter. Correspondinly, the parameters are fully represented in the graph visualization: the width of the edges is proportional to the absolute value of the parameter, and the edge color indicates the sign of the parameter. This means that we can clearly interpret an edge between two continuous variables as a positive or negative linear relationship of some strength.

Read More

Estimating Mixed Graphical Models

Determining conditional independence relationships through undirected graphical models is a key component in the statistical analysis of complex obervational data in a wide variety of disciplines. In many situations one seeks to estimate the underlying graphical model of a dataset that includes variables of different domains.

Read More