AI UX: Building at the Speed of Trust
UX for Gen AI apps involves building a useful, human-centric UX that's essential for Product Managers to get right from the get-go.
Youtube: The New Stack and Ops for AI by OpenAI
“Technology is as useful as the user experience surrounding it.”
AI copilots and chatbots present a different set of UX challenges. They have unique considerations for scaling, making it even more important to drive responsible outcomes for users.
On the topic of UX, there are two guiding principles for navigating these challenges:
Controlling for uncertainty with humans in the loop
Building guardrails for steerability and safety
Controlling for Uncertainty
Gen AI UX should augment human decisions, not replace them. Putting users at the center of AI-driven experiences places the control directly in their hands. One way to do this is by managing user expectations with an AI disclaimer.
You might have caught Gemini's disclaimer below the message bar that says "Gemini can make mistakes so double-check it." This is a transparent way of notifying the user of the chatbot's limitations.
Building with Guardrails
Guardrails are preventative controls that sit between the user experience and the model. Guardrails help prevent harmful content from getting into production apps and users. Guardrails can take on many forms in compliance and security domains to ensure truthfulness.
Guardrails are essential when building AI-powered experiences for highly regulated industries that have a low tolerance for error.
Evals
Building a user experience that aligns Gen AI models with human values is a great starting point, but our journey as AI Product Managers doesn't end there.
Everyone talks about the need to write evals that assess the accuracy of Gen AI responses using automated methods such as ROUGE.
Hugging Face implementation below.
Vibe Checking
While the need to track response accuracy is important for detecting model drift for real-time course correction, there's another type of eval that's emerging that's much more...subjective: it's referred to as vibes.
“Tech industry insiders are increasingly relying on intuition, rather than hard data, to judge which AI chatbots are best.”
In the coming days, I'll be discussing SOTA innovations that help build user confidence in Gen AI apps, including:
Retrieval Augmented Generation (RAG)
LLM as a Judge
Reinforcement Learning with Human Feedback (RLHF)