The Future of Agentic Workflows with Gemini 2.5
Image: Woven Circuit by Hanna Barakat & Archival Images of AI + AIxDESIGN | Better Images of AI
TL;DR
For the past couple of weeks, I’ve been conducting experiments with a couple of different types of agentic frameworks, mainly:
Workflows: predictable, code-driven pipelines with LLMs + tools.
Hierarchical Agents (aka Supervisors): an agent design with dynamic feedback where a Supervisor agent guides the process by delegating to other agents; agents can take actions and make decisions with autonomy.
Supervisors have a high degree of autonomy, including making decisions around which agents to delegate to and when to end the workflow.
Agent architectures are more attractive when you want to combine already pre-built ReAct agents with existing toolsets in a workflow.
Workflows are more reliable for production apps when orchestration needs to be deterministic.
I hand-coded a workflow where an agent can choose (based on the user prompt) to use built-in tools such as Google web search and code execution, by using Gemini 2.5 Flash + LangGraph.
While Gemini 2.5 Pro helped augment the code, I spent a lot of time debugging and fixing issues that it introduced.
Here is a preview of the capabilities. I’m cooking up a robust POC using Replit 🔥
#ComingSoon…
Allow Me to Reintroduce Myself
Honeymoon ‘23 📍 Rome, Italy
Hi, my name is Will 👋
I’m a recovering Data Scientist who later turned to data & AI Product Management.
When I was a DS, I delivered multiple AI / ML solutions into production with a focus on driving outcomes.
So yes, PMs can code.
And it enhanced my credibility with the engineering team when I was able to speak the same language, down to the package dependencies.
Let’s just say they’re much more willing to give PMs a seat at the table during architecture discussions.
After graduating from Baruch College with a Master’s degree in epidemiology—I studied predicting chronic conditions in the Medicare population for my thesis—I got my first DS job at Healthfirst doing Actuarial Science.
I was self-taught and picked up most of my coding abilities on the job, and through DataCamp; I eventually became fluent in multiple programming languages (SQL, Python mostly + derivatives) at a time when job titles such as ML Engineer, AI Product Manager, and Data Engineer weren’t a thing; at least in healthcare. I was doing all 3 roles.
Since then, I have >100 DataCamp courses under my belt, and I also passed the AWS Solutions Architecture Associate exam.
I picked up a lot from working with senior-level developers with 20+ years of experience at Wellpartner and Accenture, respectively.
Needless to say, old habits die hard, hence the token, recovering.
And now with the advent of vibe coding, I’m learning more about FE programming languages (JavaScript, CSS, HTML) than ever before.
#UnexpectedBenefits
Thanks AI 🙏
Agentic Workflow with Built-in & External Toolset
This python script creates an agentic workflow powered by Gemini 2.5 Flash (experimental) and orchestrated by LangGraph.
The agent-in-the-loop can use Gemini's built-in tools like Google web search and code execution.
It also has external tools to query Wikipedia info while incorporating memory.
Here’s the gist—no pun intended!
The newer Gemini models in the 2.5 series also have reasoning capabilities.
I set my thinking budget to 500 tokens, so Google won’t break the bank 😅
The key to the orchestration is the should_continue func: when orchestrated as a conditional edge in the graph, it decides to end the workflow based on whether a built-in or external tool was called and/or needed.
The workflow has the chatbot set as the entry point with persistent memory throughout the conversation, so it can stream responses based on the thread_id.
Supervisor ReAct Agent Architecture
Here’s another POC I did with a ReAct Supervisor Agent Architecture using LangGraph + GPT-4.1.
Here’s the gist 🔗 https://gist.github.com/scarnyc/83513458ebc44addc76875b0d2324285
This Jupyter notebook defines an agentic workflow for writing blog posts with a multi-agent architecture where each agent represents a stage in the blog post creation process.
High-level stages include:
Planning: Agent creates a high-level outline with key details based on the user’s prompt.
Research: Agent generates search queries to gather relevant research info.
Writing: Agent writes a blog post draft to capture the audience’s attention and curiosity, adopting the style of AI by Design.
Critique: Reviewer agent assesses the draft based on the plan, providing specific critiques and recommendations for improvement.
Revision: Editor agent revises the draft strictly according to the critique to produce the final version.
The workflow is managed by a supervisor agent that delegates tasks sequentially via the planning, research, writing, critique, and revision stages, passing the relevant context between agents.
The notebook demonstrates an example where the agents process a user query to craft a blog post—providing readers insight into my creative process~
Closing Remarks
During the experiment, the supervisor architecture would often stop prematurely, so I ended up tweaking the prompt to follow a more predictable process.
Supervisors have a high degree of autonomy, including making decisions around which agents to delegate to and when to end the workflow.
Agent architectures are more attractive when you want to combine already pre-built ReAct agents with existing toolsets in a workflow.
Workflows are more reliable for production apps when orchestration needs to be deterministic.
For my use case, I found workflows to be more reliable and less prone to hallucinations / intermittent stoppage.
What’s Cooking 🔥
In terms of next steps:
Observability using LangSmith
Generative UI using CopilotKit
Integration with other LLM providers (Anthropic, Llama)
MCP tool integrations
Web App POC with Replit
Security with Cloudflare