SWE-1 Windsurfed by OpenAI Codex
OpenAI, Codex Will Scardino OpenAI, Codex Will Scardino

SWE-1 Windsurfed by OpenAI Codex

The $3B Codex, the aborted for-profit pivot & the battle for AI full-stack dev.

What do you get when:

  • OpenAI launches a next-gen coding agent

  • Windsurf drops its first frontier models

  • A $3B acquisition

  • And a failure to convert to a for-profit?

Only the most critical inflection point in AI software engineering since GitHub Copilot!

Read More
Code Agents & Threat Vectors
Code Agents, Hugging Face Will Scardino Code Agents, Hugging Face Will Scardino

Code Agents & Threat Vectors

Code agents write and execute code to solve complex problems and complete tasks independently.

Contrasted against AI assistants, code agents translate natural language prompts into code, which users can copy and paste into an IDE.

But what if your agent didn’t stop there?

Unlike coding agents (think Cursor and Windsurf), which generate code for you to run, code agents create an action plan in one shot and execute it.

Read More
When AI Becomes a People Pleaser
Generative AI, OpenAI Will Scardino Generative AI, OpenAI Will Scardino

When AI Becomes a People Pleaser

One of the key qualities of a PM is being brave enough to say No, creatively.

But what about AI?

The curious case of GPT-4o’s sycophantic spiral.

OpenAI recently rolled back an update to GPT-4o that caused the model to behave like a raging Yes Man.

This incident highlights a critical gap in AI evaluation and alignment practices—standard evals often fail to detect sycophantic behavior and harmful agreement in real-world contexts.

That’s why it’s important to adhere to the best practices of Responsible AI.

Read More
The Future of Agentic Workflows with Gemini 2.5
Agentic Workflows, LangGraph Will Scardino Agentic Workflows, LangGraph Will Scardino

The Future of Agentic Workflows with Gemini 2.5

For the past couple of weeks, I’ve been conducting experiments with a couple of different types of agentic frameworks, mainly:

  1. Workflows: predictable, code-driven pipelines with LLMs + tools.

  2. Hierarchical Agents (aka Supervisors): an agent design with dynamic feedback where a Supervisor agent guides the process by delegating to other agents; agents can take actions and make decisions with autonomy.

Supervisors have a high degree of autonomy, including making decisions around which agents to delegate to and when to end the workflow.

Agent architectures are more attractive when you want to combine already pre-built ReAct agents with existing toolsets in a workflow.

Workflows are more reliable for production apps when orchestration needs to be deterministic.

Read More