Forecasting

5 Reasons Your Demand Forecast Is Underperforming (And How to Fix Them)

McAdoo Analytics · 8 min read

Forecast error above 25% MAPE is rarely a data problem. In most cases, it's a structural one — patterns in how the model was built, what data it sees, and how it gets maintained. After working with manufacturers, utilities, and distributors across a range of data environments, the same five issues surface almost every time.

1. You're Forecasting from Sales History Alone

Historical sales are necessary but not sufficient. Demand is shaped by promotions, pricing changes, seasonality, competitor activity, macroeconomic signals, and dozens of other factors your sales history doesn't capture. A model trained only on past units will consistently miss inflection points.

The fix: Incorporate external regressors — promotional calendars, pricing data, weather indices, or leading economic indicators — as model inputs. Even a few well-chosen covariates can meaningfully shift accuracy.

Practical note: You don't need perfect external data. A clean promotional flag and a handful of calendar features (week of year, month-end) often move the needle more than sophisticated ML architecture alone.

2. You're Running One Model Across All SKUs

A single model fit to your entire catalog is averaging out the behavior of your fastest-moving items with your slowest, your most seasonal with your most stable. The result is a model that fits nothing particularly well.

The fix: Segment your SKUs by demand pattern — velocity, coefficient of variation, seasonality strength — and apply different modeling approaches to each segment. High-volume, stable SKUs and intermittent low-volume SKUs warrant entirely different methods.

3. No Walk-Forward Validation

If your model was validated using a random train-test split, you've likely overfit to your historical period. Demand forecasting is a time-series problem — future data cannot inform the past, but a random split doesn't enforce that constraint.

The fix: Use walk-forward (expanding window) cross-validation. Train on months 1–12, test on 13–15. Then train on 1–15, test on 16–18. Repeat. This gives you honest, out-of-sample error estimates that reflect real-world performance.

4. Lead Time Isn't in the Model

A forecast horizon of 4 weeks is meaningless if your procurement lead time is 10 weeks. Most models are built without accounting for how far out the forecast actually needs to reach — and the accuracy degrades badly at longer horizons when it's not explicitly trained for them.

The fix: Align your forecast horizon to your operational lead time. Build multi-horizon models or at minimum evaluate accuracy at the specific horizons your planning team actually uses for decisions.

5. The Model Is Stale

A model trained on pre-2020 data and never updated is operating on a fundamentally different demand reality than your current environment. Demand patterns shift — seasonality changes, customer mix evolves, supply disruptions introduce structural breaks. A model that isn't retrained periodically will drift.

The fix: Build retraining into the operational workflow. Even a quarterly refit on the most recent 24 months of data — with automated performance monitoring to flag drift — will sustain accuracy far better than a one-time build.

Modern ML Forecasting: What It Actually Adds

Temporal Fusion Transformers (TFT) and gradient boosting models like XGBoost don't just "do ML" on your data — they handle multi-variate inputs, variable-length histories, and multi-horizon outputs in ways that classical statistical models fundamentally cannot. But they're not magic. The structural issues above still apply; they just have better raw material to work with once you address them.

In practice, organizations that fix the five issues above and adopt modern ML forecasting typically see 20–35% MAPE improvement over their baseline — with the gains concentrated at SKUs where demand is most variable and most important to get right.

A Practical Starting Point

If you're starting from scratch, focus first on data quality and segmentation. A clean, well-segmented ARIMA or ETS model will outperform a poorly-scoped deep learning model every time. Get your structure right, then bring in more sophisticated modeling.

If you're maintaining a model that's decaying, prioritize walk-forward validation and a retraining schedule. You don't need a rebuild — you need a maintenance discipline.

Either way, the path from broken forecasting to reliable forecasting is shorter than most teams expect — usually 6–10 weeks with focused effort and the right approach.