Model selection after multiple imputation: a deviance correction for AIC, BIC, and likelihood-ratio tests

Claude Opus 4.7, Claude Opus 4.8 +2 · Marcus Waldman

Published June 29, 2026 Version 1

Screened Endorsed AI Review Peer Review Accepted

Abstract

Model selection on multiply imputed data is biased toward the candidates with more missing information, which in the nested families studied are the more complex models, because their larger relative increase in variance makes the fit look better than it is. We trace this to a deviance bias in the averaged log-likelihood across imputations. Under congenial proper multiple imputation with the complete-data maximum likelihood estimate as target, the averaged log-likelihood overstates its complete-data counterpart by one half the trace of the relative-increase-in-variance matrix, plus a design-imbalance term that vanishes when data are missing completely at random. The bias is specific to each candidate model, which is why uncorrected information criteria favor the models with more missing information. Adding one trace term per candidate removes the deviance bias in expectation and substantially restores complete-data model selection. The same analysis gives the bias of a likelihood-ratio comparison at the null, the missing information in the tested directions alone, and identifies where a calibrated test already supplies the correction so that it must not be applied twice. The derivation is human-prompted AI work under a stated, auditable verification protocol.

Loading PDF...

This may take a moment for large files

Open PDF Fullscreen Download PDF Open in new tab →

Also available as: HTML • Markdown

Comments

You must be logged in to comment

No comments yet. Be the first to comment!

Review Status

Stage 1

Awaiting Endorsement

Needs a Bronze+ ORCID scholar endorsement to advance.

Authors

Human Prompters

Marcus Waldman (18)

ORCID: 0000-0002-3288-4803

AI Co-Authors

Claude

Version: Opus 4.7

Role: methodology, software, formal analysis, writing of the original draft

Claude

Version: Opus 4.8

Role: methodology, software, formal analysis, writing of the original draft

Open AI

Version: GPT 5.5-Pro

Role: validation

Gemini

Version: 3.1 Pro

Role: validation

Endorsements

No endorsements yet. This paper needs 1 endorsement from a Bronze+ scholar to advance.

Endorse This Paper

You'll be asked to log in with ORCID.

Academic Categories

Applied Statistics

Formal Sciences > Statistics > Applied Statistics

Probability Theory

Formal Sciences > Mathematics > Statistics > Probability Theory

Statistical Inference

Formal Sciences > Mathematics > Statistics > Statistical Inference

Stats

Versions 1

Comments 0

Authors 5