Yield Estimation: Seeing Production Before the Harvest

Written by Hyperplan | Jul 2, 2026 9:07:11 AM

Every production number is really two numbers multiplied together: how much area is planted, and how much each hectare yields. Get the first and you know where the crop is. Get the second and you know how much is coming.

Most of the industry gets the first number in-season — and the second one after harvest.

Official yield statistics are accurate, but they arrive aggregated and late. Agreste confirms French department yields once the season is closed. USDA NASS confirms US yields after the outcome is already fixed. By the time the number is public, the decisions that depended on it — positioning exposure, allocating storage, planning logistics, sizing next campaign's demand, pricing risk — have already been made on assumptions.

And the number you're trying to anticipate moves a lot. Wheat yields swing hard from one season to the next — French yields have shifted by an average of 10% year to year since 2000, and fell 32% in a single season in 2016. At the EU-aggregate level those shocks look tame, around 5%, because strong and weak regions cancel out. But nobody trades, procures or insures "the European average" — they are exposed to a region, where one bad season can erase a fifth to a half of the crop, and where last year's number is a poor guide to this one.

The commercial edge was never knowing the final yield. It's seeing the divergence early: which regions are pulling ahead, which are falling behind, where supply risk is quietly building — while there's still time to act on it.

That's the number Hyperplan set out to deliver: an in-season yield estimate, region by region — at NUTS 3 granularity, months before public confirmation.

Who this is for

The same signal reads differently depending on where you sit:

→ Origination and trading desks get an early view of production risk and supply–demand balance by region, before public sources confirm it.

→ Cooperatives can plan procurement, storage allocation and transport against expected volumes, not last year's averages.

→ Input manufacturers and distributors sharpen demand planning and territory prioritization on expected performance, not acreage alone.

→ Crop insurers get an independent, explainable regional yield signal for benchmarking and loss anticipation.

Why yield is hard — and what we do differently

Yield is one of the hardest problems in agricultural remote sensing, for a simple reason: the ground truth is scarce, late, and coarse. Anyone can regress a vegetation index against a published yield figure and get a curve. Making that estimate reliable, comparable across countries, and explainable is the hard part — and it's where most approaches quietly break down.

Three things make our approach different.

1. It starts from a field map and an in-season crop map we built ourselves

A yield estimate is only ever as good as the field it's computed on. If the boundary is wrong, you're averaging across two parcels growing different crops. If the crop label is wrong, you're modelling the wrong plant entirely.

Because Hyperplan produces its own parcel-level field boundaries and its own in-season crop classification — the subjects of our two previous articles — every regional yield estimate is aggregated from correctly-identified fields of the right crop, rather than from mixed pixels or a coarse grid. That field-level foundation is invisible in the final number, but it's the reason the number holds up.

2. We measure the season in thermal time, aligned to crop phenology — not calendar dates

Instead of feeding raw, calendar-indexed satellite time series into a model, we slice each season into growing-degree-day (GDD) bins keyed to the crop's actual development stages — growth, flowering, grain filling — each with a crop-specific base temperature.

The practical consequence matters more than it sounds. A "flowering-window biomass" signal means the same physiological thing whether it's measured in south-west France in a hot year or in northern Germany in a cool one. Anchoring features to thermal time rather than the calendar is what lets a single model stay comparable across countries and seasons, instead of silently overfitting to one region's weather in one year.

3. Every input is a proxy for a known agronomic driver — and every output is explainable

Our models don't ingest a black box of pixels. Each feature maps to a biophysical process that actually moves yield:

Biomass — NDVI, EVI, NDMI
Water stress — soil available-water content, irrigable area, precipitation, net precipitation
Heat and cold stress — average temperature, frost-threshold days
Grain-filling energy — radiation, and the radiation-to-temperature ratio
MAPE — the average gap between our estimate and the official yield, as a percentage. An 8% MAPE on a 7 t/ha wheat region means the estimate is typically within about half a tonne per hectare.
Weighted by area — larger growing regions count more than marginal ones, so the figure reflects where the crop actually is.
Best 80% of surface — we report the four-fifths of cultivated area the models handle most accurately, and are explicit that the remaining fifth — small, atypical or heavily stressed regions — carries larger error.
Two years, shown separately — 2023 and 2025 are independent held-out seasons; splitting them shows the result isn't resting on one lucky year.

And because we run SHAP attribution on every model, we can say why a region's yield is estimated high or low — which driver pulled it there. When a trading desk or an insurer asks "why this number?", there's an answer, not a shrug.

One more thing: we're explicit about what the models don't know. We compute thermal time from a fixed seasonal start rather than each field's exact sowing date, and thermal time alone doesn't fully capture how drought reshapes a crop's development. These are deliberate approximations — acceptable because they capture the first-order signal at scale, and because we'd rather be transparent about the error we carry than hide it behind a confidence interval nobody can interrogate.

Our results

The 2026 production models cover five crops — winter wheat, barley, rapeseed, maize and sunflower — validated across up to 17 European countries, from Spain to the Nordics and from France to Ukraine.

The service delivers yield at NUTS 3 (regional) level — the standard European statistical region, roughly a département in France, a Kreis (district) in Germany, or a județ (county) in Romania. It is validated at exactly that granularity, against official regional yield statistics — the only yield ground truth that exists at this scale. Like for like: no gap between what we measure and what we deliver.

Models were trained on the 2017–2022 and 2024 seasons and validated on two fully held-out years, 2023 and 2025 — including the most recent completed season. Nothing from those years touched training.

The metric below is mean absolute percentage error (MAPE), weighted by cultivated area, reported over the best 80% of surface — excluding the highest-error tail that makes up the last fifth of area. In plain terms: across four-fifths of the growing area, this is how far the estimate sits from the official figure.

Crop	MAPE 2023	MAPE 2025
Winter wheat	7.1%	8.1%
Barley	9.2%	9.0%
Rapeseed	8.1%	6.8%
Maize	7.5%	8.9%
Sunflower	8.7%	5.6%

How to read this table: Two fully held-out seasons, shown separately. Area-weighted MAPE vs. official NUTS 3 yields, over the 80% of surface with the smallest error. Across the five crops, validation spans up to 17 countries and 300–700+ NUTS 3 regions per crop. Across every crop, the estimate lands within roughly 6–9% of official yield — and it holds across both held-out years. 2025, a distinctly different season from 2023, performs just as well on every crop; no single year is carrying the result. Cereals and rapeseed are the strongest; maize and sunflower are marginally harder, as both span wider production systems and are more exposed to late-season stress.

And the models keep improving, season over season. Across the full growing surface, this year's generation cut NUTS 3 error by roughly 40% on wheat and maize, and by about two-thirds on rapeseed, versus the previous one — the result of retraining and refinement every season, not a one-off calibration.

Zoom out to the national total, and the error shrinks again. Consolidated to a country's full production — every region summed — the models land within roughly 2–3% of official national yield on most crops. That is not a contradiction of the regional figures: regional over- and under-estimates offset one another once aggregated, so the national number you would size a supply balance or a trading position on is tighter than any single region's. In other words, the estimate is granular enough to act on region by region, and reliable enough to trust in aggregate. Regional detail and per-country breakdowns are available on request.

And it beats simply extrapolating the past — when it counts. A five-year average is a hard baseline to improve on in a normal season: when yields land near trend, history is already close. The value shows up in the divergent years — the drought and shock seasons that move markets. There, where the historical average breaks down, our 2025 estimates tracked the actual outcome in every divergent case we checked. The case only gets stronger at the regional level: a national average is steadied by regional highs and lows cancelling out, but a single region has no such cushion — it swings far more, so a line drawn from the past is an even weaker guide there.

And it's available in-season. The same estimate updates as the crop develops, so the divergence between regions is visible months before harvest — not confirmed after it. That is the difference between acting on the season and reporting on it.

The full production picture

Boundaries tell you where the fields are. Classification tells you what's growing. Yield tells you how much is coming — early enough to do something about it.

Together, that's the production picture our clients act on every week, across 35+ countries and 250M+ hectares of monitored farmland. Not three separate data products, but one chain — each layer only as strong as the one beneath it, and all three built in-house.

If you'd like to see what an early read on production looks like for your crops and your territory, book a demo.

Hyperplan delivers in-season, field-level crop intelligence to commercial and supply teams in agribusiness — market sizing, territory planning, supply-risk anticipation and procurement, across 35+ countries.

View full post