The Statistics Underneath AI Underwriting: Coefficients, ANOVA, and What the Models Are Actually Estimating
The multifamily pro forma has been, for forty years, a deterministic artifact. Assume rents, assume growth, assume stabilized vacancy, plug in a cap rate, print the IRR. The number at the bottom was treated as a forecast even though the inputs were asserted, not estimated.
That is finally ending. The underwriting tools shipping in 2026 are not faster spreadsheets. They are statistical models. Describing them accurately requires a different vocabulary. Coefficients. Residuals. Variance budgets. Out-of-sample error. Regularization. Most of the industry is still talking about AI underwriting in the language of automation, and that understates both what the tools are doing and what they demand of the analyst using them.
1. The pro forma is a regression problem
Write down the underwriting task formally. Next-year NOI for an asset is a function of a vector of features: job growth in the MSA, wage growth, supply under construction, net migration, cap rate spread over the ten-year, asset vintage, unit mix, operator track record, plus an irreducible error term.
$$ \text{NOI}_t = f(X_t) + \varepsilon_t $$
Everything interesting about AI underwriting sits inside three questions about that equation. What shape does $f$ take: linear, additive, interactive, deeply non-linear. How much of the variance in NOI is captured by $f(X_t)$ versus how much lives in $\varepsilon_t$. How stable are the parameters of $f$ across time and across markets.
The traditional pro forma answers none of these. It asserts inputs and hopes.
2. What the coefficients actually say
Fit a basic linear model of annual effective rent growth on a panel of metro-level multifamily data from 2012 through 2024, standardized covariates, and the coefficient table comes out roughly like this. The numbers below are illustrative of what practitioners see in their own estimation, not citations of any specific paper.
| Covariate | Standardized β | Std. error | t-stat | Reading |
|---|---|---|---|---|
| Job growth, 12 month | +0.68 | 0.04 | 17.0 | The single largest explanator |
| Wage growth, 12 month | +0.42 | 0.05 | 8.4 | Partly orthogonal to jobs |
| Net new supply | −0.31 | 0.03 | −10.3 | The only reliably negative coefficient |
| Net migration | +0.22 | 0.04 | 5.5 | Slower moving, persistent |
| 30 year mortgage rate | −0.14 | 0.06 | −2.3 | Weak, confounded with cycle |
Cross validated R² lands in the 0.55 to 0.70 range depending on the market universe. Two observations matter more than the specific numbers.
First, the signs and magnitudes are stable. Refit the specification on 2012 to 2018, or 2015 to 2021, or 2018 to 2024. The coefficients move by tenths, not halves. That stability is what makes the model useful prospectively. It is also what makes its failures diagnostic.
Second, standardized coefficients rank features on a common scale. Job growth matters roughly twice as much as mortgage rates per unit of variation. Every experienced underwriter already knows this qualitatively. The regression just quantifies it. When a deal is sensitive to a feature the model considers marginal, that is a signal to stress test, not a signal that the model is wrong.
3. ANOVA and the variance budget
Regression tells you direction and magnitude. ANOVA tells you where the variance lives.
Decompose the variation in exit cap rates across a reasonable transaction dataset. The decomposition almost always looks qualitatively similar.
| Source | Share of total variance | F statistic |
|---|---|---|
| Market, MSA level | ≈ 55% | Highly significant |
| Submarket within market | ≈ 15% | Significant |
| Vintage and product class | ≈ 12% | Significant |
| Asset specific | ≈ 8% | Variable |
| Unexplained residual | ≈ 10% | Irreducible |
The operational reading is blunt. Roughly seventy percent of the variation in what you will sell your asset for is determined before you touch the deal. Market selection and submarket choice dominate the budget. The asset level work we obsess over, amenity package, management team, capex strategy, competes for less than a fifth of the total.
The uncomfortable corollary. If your edge is concentrated in asset level execution, you are fighting for the 8% bucket while the 70% bucket is set by your market selection. Teams that cannot explain their market picks in coefficient language are usually not picking them, they are accepting them.
A nested ANOVA, market into submarket into asset, is the right specification. Real estate data is hierarchical, and flat regressions on nested data understate standard errors and produce spurious precision.
4. Beyond linearity
Linear models are the right place to start because their coefficients are interpretable. They are usually not the right place to stop.
The rent growth surface has meaningful interactions. Job growth matters more in supply constrained markets than in oversupplied ones. That interaction is invisible in an additive linear model. Gradient boosted trees (XGBoost, LightGBM) capture it natively, and on the same feature set they typically cut out of sample RMSE by fifteen to twenty five percent against OLS for forward rent growth.
That improvement is real, but not free.
Interpretability cost. A tree ensemble has no coefficient table. SHAP values and partial dependence plots are the right substitutes. Any committee presenting an ML driven forecast should include both.
Data hunger. Tree models overfit aggressively on small samples. Under about five thousand observations per market cluster, the non linear model probably is not justified, and the discipline of fitting OLS first will show that.
Tuning sensitivity. The gap between a well tuned GBM and a sloppy one is wider than the gap between a sloppy GBM and OLS. Max depth, learning rate, minimum samples per leaf, subsample ratio. All of it matters.
The test for whether non linearity is earning its keep is straightforward. Fit both. Compare on a time aware split. If the top SHAP features are the same as the top regression coefficients and the interactions are weak, the linear model wins on parsimony.
5. Out of sample testing is not optional
The single most common failure in AI underwriting work product is temporal leakage. Evaluating a model on a shuffled train test split, so that observations from 2023 end up in training and observations from 2019 end up in test. This inflates accuracy metrics catastrophically and is invisible to anyone not looking for it.
The defensible protocols are specific.
Time ordered split. Train on everything up to time $t$, test on everything after. No shuffling.
Walk forward cross validation. Rolling windows, each producing an out of sample error estimate, aggregated across folds. This mimics the real underwriting task, which is forecasting next year from what you know today.
Benchmark against naive baselines. The first benchmark is not another model. It is next year's rent growth equals last year's rent growth. A model that does not beat the naive baseline out of sample is not a model. It is a graph.
Report MAPE and RMSE. Mean absolute percentage error speaks to practitioner intuition. Root mean squared error punishes large misses, which is what matters for tail risk underwriting.
Any vendor unwilling to describe their validation protocol at this level of specificity should be assumed to have not done it.
6. Where the models break
The most important statistical work in underwriting is the work done on the residuals. The ten to twenty percent of variance the model cannot explain is often exactly the part that decides the deal.
Known failure modes, roughly in order of how much damage they cause.
Non stationarity. The 2020 to 2022 rent cycle was a structural break. Models trained through 2019 missed it. Models trained only on 2020 through 2022 are overfit to a once in a generation migration shock. Production models need explicit treatment of regime changes, either through change point detection, dummies for known shocks, or Bayesian priors that widen during structural breaks.
Confounding. Low interest rates and strong rent growth coincided for most of 2012 through 2021. A regression fit on that period cannot cleanly separate the two effects, and extrapolating either coefficient to a rising rate environment is hazardous. The analyst's judgment about which causal story to privilege is not replaceable by more data.
Survivorship bias. The assets that traded are not a random sample of the assets that existed. Distressed assets that never reached closing are underrepresented, biasing cap rate distributions toward the cleaner end of the market.
Sponsor quality. Models cannot observe the execution capacity of the operating team. That is the largest irreducible source of $\varepsilon$ in the multifamily underwriting equation, and it is the reason allocator level judgment still decides outcomes.
A disciplined workflow acknowledges each of these in writing on every memo. A model that claims to have solved them should be assumed to be miscalibrated.
7. The analyst's new job
AI underwriting does not eliminate the underwriter. It changes what the underwriter is actually doing on a Tuesday.
The highest leverage activities on a 2026 underwriting desk are:
Feature engineering. Deciding which variables belong in $X$ and in what form, logs, lags, first differences, interactions. This is where domain expertise compounds with statistical fluency, and where most of the real alpha lives.
Diagnostic testing. Heteroskedasticity, autocorrelation, multicollinearity, influential observations. The models run regardless of whether the assumptions are met. The analyst's job is to notice when they are not.
Residual interrogation. Every deal the model mispriced by more than some threshold gets a written post mortem. Why did the model miss. Is it a recurring blind spot, a regime change, or an outlier the model correctly deemed low probability.
Judgment layering. The model produces a point estimate and a prediction interval. The analyst decides what range of outcomes to underwrite to, which depends on risk appetite, cost of capital, and fund strategy. None of those live inside the statistical model.
The best underwriters in 2026 are not the ones who run the most scenarios. They are the ones who can describe, in coefficient language, what they believe about the world and where they are prepared to be wrong.
Closing
The last decade of AI underwriting was sold as automation, the same pro forma but faster. The next decade is a change in epistemology. The pro forma stops being the primary artifact. It gets replaced by a probabilistic forecast with explicit coefficients, explicit assumptions, and explicit error bars.
The firms that treat this as a software upgrade will underperform the firms that treat it as a new way of thinking about what underwriting actually is.
The statistics were always there. The models just make it harder to pretend otherwise.