The Head of Inference seat is now a CFO conversation.

Two years ago Head of Inference was an engineering seat. It sat under a VP Engineering, it was measured on uptime and latency, and the cost number it was responsible for was a back-of-envelope estimate the CFO did not look at closely. Today the same seat is, at the AI-native firms working with us on this search, reporting directly to the CTO with a dotted line to the CFO and a quarterly board check-in. The reason is that inference cost has become the largest controllable line on the P&L and almost everyone running an AI-native business has noticed at the same time.

What changed in eighteen months

Three things compounded. Production traffic grew faster than the cheap-inference frontier did, so cost per useful response stopped falling on its own. Boards started asking the question 'what is our inference COGS as a percentage of revenue' and discovering they did not know. And the major cloud providers' AI-specific pricing began to bifurcate sharply enough that the choice of provider, instance type and bin-packing strategy became a strategic question rather than a procurement one.

Inside a year, the firms that had this seat held by a strong-but-not-strategic engineering leader started losing margin to firms whose Head of Inference was thinking in cost-of-revenue terms. The hiring response is now in train across the industry.

What the seat now does

Six things. Manages the relationship with cloud providers at a level that involves quarterly business reviews, not ticket queues. Owns the inference-stack technology choice across model server, batching, quantisation, KV-cache management, and increasingly custom silicon evaluation. Sets the cost-per-request targets that product teams plan against. Negotiates capacity reservations against forward demand projections. Runs the build-versus-buy decision on inference infrastructure. Sits in the board conversation when COGS is on the agenda.

“Inside eighteen months this stopped being a procurement question and started being a strategic question. The org chart has not caught up.”

Why the org chart has not caught up

Because the candidate pool is small and the founders who need this hire most are typically the ones least equipped to run the search. The pool is some combination of: senior ex-cloud-provider engineers who have left the AWS or GCP AI-infra teams; ex-quant-finance infrastructure leaders who have learned to think about cost-per-trade and translate it to cost-per-token; and a small number of high-output ICs who, having scaled a single firm's inference twice, are now ready to step up.

None of these candidates show up in standard VP Engineering searches because the title does not fit and the comp structure they require does not align with how engineering ladders are paid. Most firms in this position have to design a bespoke compensation package around equity and outcome-linked cash to land the hire.

What the brief should include

Three things the brief usually misses. First, the explicit COGS target the seat will own — not the latency target, the COGS target. Second, the named CFO peer this person will partner with; the seat does not function without that. Third, an acknowledgement that the search may take six to nine months. The boards who go in pretending it will close in three open second searches.

Where this is heading

Inference economics will keep tightening as the marginal user becomes less valuable and the marginal request becomes more expensive. The firms that have a strong, board-facing inference leader by the end of 2026 will have a structural margin advantage over those that do not. The seat is becoming what Head of Trading became at exchanges in the 2010s: a discipline that, once visible, never disappears from the board agenda again.

What changed in eighteen months

What the seat now does

Why the org chart has not caught up

What the brief should include

Where this is heading

More from this industry

What 'Head of Research' actually means in 2026.

Model safety leadership has moved to the board agenda. Hiring practices have not caught up.

Where the digital assets and AI talent pools genuinely overlap.

Get tactical guidance from the right specialist for your brief.