📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google emphasizes that the core of AI development is not the model itself but the surrounding harness and verification processes. This shift in focus has significant implications for how organizations build and maintain AI systems.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model accounts for only 10% of system behavior in AI development. The paper argues that the harness, verification, and context engineering are far more influential, marking a paradigm shift in how organizations should approach AI systems.

The whitepaper, titled The New SDLC With Vibe Coding, reports that 85% of professional developers use AI coding agents regularly, with over half using them daily. It emphasizes that 41% of all new code is AI-generated, yet the key insight is that the model itself is only a small piece of the overall system. The authors highlight that most failures in AI agents stem from configuration issues—missing tools, vague rules, or noisy context—rather than the model’s capabilities.

One of the paper’s core messages is that the ‘harness’—the prompts, tools, policies, and observability surrounding the model—determines 90% of the behavior. Evidence from experiments shows that changing only the harness or tweaking prompts can significantly improve performance, even with the same underlying model.

Furthermore, the paper stresses that cost and efficiency in AI development are driven by how well teams engineer their context and harness, not just by adopting newer, larger models. It advocates for a disciplined approach—agentic engineering—that incorporates verification, structured context, and modular skills, which can be more cost-effective over time.

At a glance

reportWhen: published early 2026

The developmentGoogle’s new whitepaper asserts that the AI model constitutes only about 10% of system behavior, emphasizing the importance of harness and verification in AI development.

The Model Is Only 10% — The New SDLC With Vibe Coding

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding redefines how organizations should invest in AI. Instead of chasing the latest models, companies are encouraged to focus on building robust harnesses and verification processes. This approach can lead to more reliable, cost-effective AI systems and a competitive advantage, as the durable, scalable parts of AI are within their control. The insight also suggests that costs associated with model upgrades are less impactful than those incurred by poor configuration and context management.

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

As an affiliate, we earn on qualifying purchases.

Evolution of AI System Design and Best Practices

The paper builds on the ongoing evolution of AI engineering, which has moved from vibe coding—quick prompts with minimal oversight—to agentic engineering, characterized by formal specifications, testing, and structured context. Since early 2026, AI adoption has surged, with a majority of developers integrating AI tools into their workflows, but the core challenge remains: how to reliably control and verify AI outputs. Previous focus was on acquiring larger models, but recent experiments show that tuning the harness yields greater performance improvements.

This development aligns with broader trends in software engineering, emphasizing verification, modularity, and cost management over raw model size. The whitepaper’s findings reinforce the idea that the real skill lies in context engineering and system configuration.

“The model is only 10% of what determines behavior; the harness is 90%. The behavior you experience is dominated by scaffolding you can build, own, and improve.”
— Addy Osmani

Agentic Coding – Build the Harness: The Loop, Guardrails, and Verification That Make Your Agent Reliable on Real Code, Not Just Demos

As an affiliate, we earn on qualifying purchases.

Remaining Questions on Practical Implementation

While the paper provides compelling evidence that harness and verification are critical, it remains unclear how organizations will scale these practices across diverse AI applications. The precise methods for measuring and optimizing harness effectiveness are still evolving, and the impact of different types of tasks or models requires further study. Additionally, how quickly industries will adopt these insights and shift their investment priorities is uncertain.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Teams and Industry Adoption

Organizations should prioritize developing and testing their harnesses, including prompts, tools, and verification frameworks, to improve AI reliability and cost-efficiency. Future research will likely focus on standardized metrics for harness quality and best practices for context engineering. Industry leaders may begin to publish case studies demonstrating the tangible benefits of this disciplined approach, accelerating adoption.

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper shows that factors like prompts, tools, rules, and observability—collectively called the harness—determine most of the AI’s output, making the model itself a small part of the overall system.

How can organizations improve their AI systems based on this insight?

Focusing on building robust harnesses, verifying outputs, and managing context effectively can lead to more reliable, cost-efficient AI applications.

Does this mean larger models are less important?

Not necessarily, but the whitepaper suggests that beyond a certain point, investing in better harnesses and verification yields greater returns than simply acquiring bigger models.

What are the economic implications of this shift?

High upfront costs in designing systems and context management can reduce long-term operational costs, making AI development more sustainable and scalable.

Source: ThorstenMeyerAI.com

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Cutrova: Edit the Words, Not the Timeline

Author

PepperEyes Team

Share article

The model is only 10%

Implications for AI Development Strategies

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

Evolution of AI System Design and Best Practices

Agentic Coding – Build the Harness: The Loop, Guardrails, and Verification That Make Your Agent Reliable on Real Code, Not Just Demos

Remaining Questions on Practical Implementation

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

Next Steps for AI Teams and Industry Adoption

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Key Questions

Why is the model only 10% of the system’s behavior?

How can organizations improve their AI systems based on this insight?

Does this mean larger models are less important?

What are the economic implications of this shift?

Discover 12 AI Tools That Will Change Content Creation In 2026

HBM Ate The Fab

7 Best Wireless Smartwatches for Prime Day Deals in 2026

Xbox

Inside Room 23: AI Techniques In ‘Kanton Alpin Verkehrsbetriebe’

SenseTime’s Galaxy Project: Accelerating Domestic AI Chip Development

How Seedance Helped ByteDance Reclaim Its Spot In The AI Competition

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Author

PepperEyes Team

Share article

The model is only 10%

Implications for AI Development Strategies

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

Evolution of AI System Design and Best Practices

Agentic Coding – Build the Harness: The Loop, Guardrails, and Verification That Make Your Agent Reliable on Real Code, Not Just Demos

Remaining Questions on Practical Implementation

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

Next Steps for AI Teams and Industry Adoption

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Key Questions

Why is the model only 10% of the system’s behavior?

How can organizations improve their AI systems based on this insight?

Does this mean larger models are less important?

What are the economic implications of this shift?

You May Also Like