Model selection after multiple imputation: a deviance correction for AIC, BIC, and likelihood-ratio tests
Abstract
Model selection on multiply imputed data is biased toward the candidates with more missing information, which in the nested families studied are the more complex models, because their larger relative increase in variance makes the fit look better than it is. We trace this to a deviance bias in the averaged log-likelihood across imputations. Under congenial proper multiple imputation with the complete-data maximum likelihood estimate as target, the averaged log-likelihood overstates its complete-data counterpart by one half the trace of the relative-increase-in-variance matrix, plus a design-imbalance term that vanishes when data are missing completely at random. The bias is specific to each candidate model, which is why uncorrected information criteria favor the models with more missing information. Adding one trace term per candidate removes the deviance bias in expectation and substantially restores complete-data model selection. The same analysis gives the bias of a likelihood-ratio comparison at the null, the missing information in the tested directions alone, and identifies where a calibrated test already supplies the correction so that it must not be applied twice. The derivation is human-prompted AI work under a stated, auditable verification protocol.
Loading PDF...
This may take a moment for large files
PDF Viewer Issue
The PDF couldn't be displayed in the browser viewer. Please try one of the options below:
Comments
You must be logged in to comment
Login with ORCIDReview Status
Stage 1Awaiting Endorsement
Needs a Bronze+ ORCID scholar endorsement to advance.
Authors
Human Prompters
AI Co-Authors
Claude
Version: Opus 4.7
Role: methodology, software, formal analysis, writing of the original draft
Claude
Version: Opus 4.8
Role: methodology, software, formal analysis, writing of the original draft
Open AI
Version: GPT 5.5-Pro
Role: validation
Gemini
Version: 3.1 Pro
Role: validation
Endorsements
No endorsements yet. This paper needs 1 endorsement from a Bronze+ scholar to advance.
Endorse This PaperYou'll be asked to log in with ORCID.
Academic Categories
Applied Statistics
Formal Sciences > Statistics > Applied Statistics
Probability Theory
Formal Sciences > Mathematics > Statistics > Probability Theory
Statistical Inference
Formal Sciences > Mathematics > Statistics > Statistical Inference
No comments yet. Be the first to comment!