Agent Labs Are Eating the Software World

TL;DR >> Agent labs ship product first and build infrastructure later. They turn LLMs into goal-directed systems that deliver outcomes, not just outputs. This product-first approach is capturing the real value in the AI stack. <<

I’ve been watching this pattern emerge for months, and it’s finally clicking into place. The AI startups that are actually winning aren’t the ones building bigger models—they’re the ones shipping products that solve real problems.

Let me explain what I’m seeing.

# The Real AI Divide

Last week I was testing yet another AI coding tool, and something hit me: these aren’t just wrappers around the latest LLM. They’re fundamentally different companies with different philosophies, different timelines, and completely different approaches to building value.

There’s a split happening in the AI world right now, and understanding it is crucial whether you’re building, investing, or just trying to figure out where this whole thing is going.

Model labs are building foundation models. They’re in the R&D business, spending years and billions training the next GPT-whatever before they even think about products.

Agent labs are shipping products today. They take existing frontier models and turn them into goal-directed systems that actually get stuff done.

As Swyx puts it:

Agent labs ship product first, and then work their way down as they get data, revenue and conviction and deep understanding of their problem domain.

The difference isn’t just technical—it’s cultural, financial, and strategic.

Agent labs are also more realistic about capabilities. As Karpathy notes, “My critique of the industry is more in overshooting the tooling w.r.t. present capability.”

# What Makes an Agent Lab

I spent time digging into what Swyx calls “agent labs” and here’s what I’ve learned from watching companies like Cognition (Devin), Cursor, and Factory AI:

They ship first, optimize later. While model labs are in multi-year R&D cycles, agent labs are shipping products in weeks and iterating based on real user feedback.

They own the full workflow. Model labs see prompts and responses. Agent labs see the entire trace—file changes, tool calls, test results, user approvals. That operational data is their moat.

They’re domain-specific. Instead of trying to build general intelligence, they focus on specific domains where there’s still “lots of work remaining” - the integration work, the domain expertise, the grunt work that Karpathy emphasizes as the real challenge.

They deliver outcomes, not outputs. This is the key insight. You’re not paying for AI tokens—you’re paying for deployed applications, closed tickets, shipped features, or resolved bugs.

# Why Product-First Beats Model-First

Here’s what I’ve seen in the wild: companies that start with products have a massive advantage over those that start with models.

The Data Advantage

When Cursor helps you write code, they capture everything: your repository structure, your coding patterns, your acceptance criteria, the files you modify, the tests you run. They’re building a dataset that OpenAI and Anthropic can never access.

When Devin builds a feature, they capture the entire development workflow: planning, implementation, testing, deployment. That’s proprietary training data worth more than any publicly available dataset.

The Feedback Loop

Agent labs design surfaces that emit metrics worth optimizing. Tests pass, features ship, bugs get fixed. These become reinforcement signals that are impossible to replicate at the model layer.

OpenAI can optimize for next-token prediction. Cursor can optimize for “feature completion rate.” Which one do you think drives more business value?

The Revenue Reality

Model labs need billions in funding and years of R&D before they see revenue. Agent labs can start charging in weeks.

I’ve watched this with tools like AMP Code and Cursor. They’re charging real money for real value delivered today, not promising AGI tomorrow.

# The Architecture That’s Winning

Every successful agent lab I’ve studied converges on the same core architecture:

  • Reasoning layer - Planning, reflection, decomposition
  • Memory system - Long-term context and recall
  • Tool execution - APIs, databases, code, systems
  • Control loops - Self-evaluation, retry, improvement

Around these cores, they invest in what matters: context engineering, multi-agent orchestration, evaluation frameworks, and observability.

The result isn’t just better chatbots—it’s autonomous systems with bounded autonomy that can execute end-to-end workflows.

# The Evaluation Layer That Matters

Here’s something that surprised me: agent labs invest more in evaluation and guardrails than in model improvement.

Why? Because reliability trumps raw intelligence every time.

I’ve seen agent systems that fail 30% of the time with brilliant reasoning, and systems that succeed 95% of the time with basic logic. Customers pay for the 95% success rate, not the brilliant failures.

Top labs build comprehensive eval harnesses covering:

  • Reliability: Task success rates, test pass rates
  • Quality: Hallucination rates, plan completeness
  • Efficiency: Cost per successful task, p99 latency
  • Safety: Guardrail triggers, escalation rates
  • User impact: Satisfaction, rollback rates

# The Competitive Moat

I used to think the big model labs would eventually crush everyone else. Now I’m not so sure.

Agent labs have defensive moats that model labs can’t replicate:

  • Workflow data - They see how work actually gets done in organizations
  • Domain expertise - They understand the nuances of specific industries
  • User relationships - They own the customer relationship and usage patterns
  • Evaluation infrastructure - They’ve built systems to measure what matters

OpenAI can always build a better model. But can they build a better software development workflow than Cursor? Can they understand customer support better than a specialized agent lab?

# The Playbook I’m Seeing

After studying dozens of these companies, I’ve identified the pattern:

  • Stage 1: Start as API consumer. Use existing models with smart orchestration.
  • Stage 2: Capture traces and tool usage data. Build eval harnesses.
  • Stage 3: Train narrow models for specific tasks (embeddings, routers, autocomplete).
  • Stage 4: Run fine-tuning on captured signals.
  • Stage 5: Gradually develop proprietary models for your domain.

This top-down evolution lets them de-risk R&D and compound their data advantages while generating revenue from day one.

# Why This Matters for You

If you’re building AI products, the agent lab model is worth studying closely.

For founders: You don’t need billions in funding or a team of PhD researchers. You need a deep understanding of a domain and the ability to build reliable workflows on top of existing models.

For developers: The most valuable skills are shifting from model architecture to system design, evaluation engineering, and domain-specific workflow optimization.

For investors: Look for companies that capture workflow data and have clear evaluation metrics. The moat is in the data and the feedback loops, not the models themselves.

# The Decade Ahead

Swyx frames this as the shift from the “Decade of Models (2015-2025)” to the “Decade of Agents (2025-?).”

As Andrej Karpathy puts it: “This is the decade of agents.”

I think they’re both right.

The frontier is moving from raw model scaling to agentic orchestration, reliability, and integration. Value is accruing to those who own user interaction, reward signals, and operational data.

Model labs will continue pushing the boundaries of what’s possible. But agent labs will distribute those capabilities to solve real problems.

The result is a new industrial layer of agentic software companies—lean, fast, and outcome-oriented—that are transforming work from interaction to execution.

# What I’m Watching Next

I’m keeping my eye on several trends:

  • Multi-agent orchestration - Systems that decompose complex goals into specialized sub-agents
  • Recursive improvement - Agents that use agents to build better agents
  • Outcome-based pricing - Moving from token billing to value-based pricing
  • Enterprise adoption - How large organizations integrate agentic systems

The companies that figure out how to align reasoning, tools, and reward loops around human goals will define the software era ahead.

Model labs gave us intelligence. Agent labs are giving it a job description.

And that’s how the software world gets rebuilt—one agent lab at a time.


This piece draws heavily from the work of Swyx, particularly his analysis of Cognition and the agent lab thesis, as well as insights from Akash Bajwa’s writing on AI agents and product development. The synthesis and observations are mine.

Nikola Balić

I build go-to-market engines for AI driven products that matter.