📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google emphasizes that the core of AI development is not the model itself but the surrounding harness and verification processes. This shift in focus has significant implications for how organizations build and maintain AI systems.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model accounts for only 10% of system behavior in AI development. The paper argues that the harness, verification, and context engineering are far more influential, marking a paradigm shift in how organizations should approach AI systems.

The whitepaper, titled The New SDLC With Vibe Coding, reports that 85% of professional developers use AI coding agents regularly, with over half using them daily. It emphasizes that 41% of all new code is AI-generated, yet the key insight is that the model itself is only a small piece of the overall system. The authors highlight that most failures in AI agents stem from configuration issues—missing tools, vague rules, or noisy context—rather than the model’s capabilities.

One of the paper’s core messages is that the ‘harness’—the prompts, tools, policies, and observability surrounding the model—determines 90% of the behavior. Evidence from experiments shows that changing only the harness or tweaking prompts can significantly improve performance, even with the same underlying model.

Furthermore, the paper stresses that cost and efficiency in AI development are driven by how well teams engineer their context and harness, not just by adopting newer, larger models. It advocates for a disciplined approach—agentic engineering—that incorporates verification, structured context, and modular skills, which can be more cost-effective over time.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper asserts that the AI model constitutes only about 10% of system behavior, emphasizing the importance of harness and verification in AI development.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding redefines how organizations should invest in AI. Instead of chasing the latest models, companies are encouraged to focus on building robust harnesses and verification processes. This approach can lead to more reliable, cost-effective AI systems and a competitive advantage, as the durable, scalable parts of AI are within their control. The insight also suggests that costs associated with model upgrades are less impactful than those incurred by poor configuration and context management.

Amazon

AI model validation tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI System Design and Best Practices

The paper builds on the ongoing evolution of AI engineering, which has moved from vibe coding—quick prompts with minimal oversight—to agentic engineering, characterized by formal specifications, testing, and structured context. Since early 2026, AI adoption has surged, with a majority of developers integrating AI tools into their workflows, but the core challenge remains: how to reliably control and verify AI outputs. Previous focus was on acquiring larger models, but recent experiments show that tuning the harness yields greater performance improvements.

This development aligns with broader trends in software engineering, emphasizing verification, modularity, and cost management over raw model size. The whitepaper’s findings reinforce the idea that the real skill lies in context engineering and system configuration.

“The model is only 10% of what determines behavior; the harness is 90%. The behavior you experience is dominated by scaffolding you can build, own, and improve.”

— Addy Osmani

Amazon

AI harness and verification software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Remaining Questions on Practical Implementation

While the paper provides compelling evidence that harness and verification are critical, it remains unclear how organizations will scale these practices across diverse AI applications. The precise methods for measuring and optimizing harness effectiveness are still evolving, and the impact of different types of tasks or models requires further study. Additionally, how quickly industries will adopt these insights and shift their investment priorities is uncertain.

Amazon

AI prompt engineering tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Teams and Industry Adoption

Organizations should prioritize developing and testing their harnesses, including prompts, tools, and verification frameworks, to improve AI reliability and cost-efficiency. Future research will likely focus on standardized metrics for harness quality and best practices for context engineering. Industry leaders may begin to publish case studies demonstrating the tangible benefits of this disciplined approach, accelerating adoption.

Amazon

AI observability and monitoring software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper shows that factors like prompts, tools, rules, and observability—collectively called the harness—determine most of the AI’s output, making the model itself a small part of the overall system.

How can organizations improve their AI systems based on this insight?

Focusing on building robust harnesses, verifying outputs, and managing context effectively can lead to more reliable, cost-efficient AI applications.

Does this mean larger models are less important?

Not necessarily, but the whitepaper suggests that beyond a certain point, investing in better harnesses and verification yields greater returns than simply acquiring bigger models.

What are the economic implications of this shift?

High upfront costs in designing systems and context management can reduce long-term operational costs, making AI development more sustainable and scalable.

Source: ThorstenMeyerAI.com

You May Also Like

Liquid vs Air Cooling for 24/7 Inference Rigs

Comparison of liquid and air cooling for continuous AI inference systems, highlighting reliability, cost, and performance considerations.

7 Best PC Motherboards for Prime Day Deals in 2026

Discover the best PC motherboards on Prime Day 2026, including options for AM4 and AM5 platforms, with insights on features and value for different builds.

7 Best PC Tablets for Prime Day Deals in 2026

Discover the best PC tablets on Prime Day 2026, including Samsung Galaxy Tab S9, Surface Pro 11, and iPad 9th Gen, with expert insights on value and performance.

Privacy Zones and Why They Matter for Door Cameras

Just understanding how privacy zones protect your home and neighbors can help you decide whether to set them up—discover more inside.