AI UX: Building at the Speed of Trust

UX for Gen AI apps involves building a useful, human-centric UX that's essential for Product Managers to get right from the get-go.

Youtube: The New Stack and Ops for AI by OpenAI

Technology is as useful as the user experience surrounding it.
— Shyamal Anadkat, OpenAI

AI copilots and chatbots present a different set of UX challenges. They have unique considerations for scaling, making it even more important to drive responsible outcomes for users.

On the topic of UX, there are two guiding principles for navigating these challenges:

  1. Controlling for uncertainty with humans in the loop

  2. Building guardrails for steerability and safety


Controlling for Uncertainty

Gen AI UX should augment human decisions, not replace them. Putting users at the center of AI-driven experiences places the control directly in their hands. One way to do this is by managing user expectations with an AI disclaimer.

You might have caught Gemini's disclaimer below the message bar that says "Gemini can make mistakes so double-check it." This is a transparent way of notifying the user of the chatbot's limitations.


Building with Guardrails

Guardrails are preventative controls that sit between the user experience and the model. Guardrails help prevent harmful content from getting into production apps and users. Guardrails can take on many forms in compliance and security domains to ensure truthfulness.

Guardrails are essential when building AI-powered experiences for highly regulated industries that have a low tolerance for error.


Evals

Building a user experience that aligns Gen AI models with human values is a great starting point, but our journey as AI Product Managers doesn't end there.

Everyone talks about the need to write evals that assess the accuracy of Gen AI responses using automated methods such as ROUGE.

Hugging Face implementation below.

🔗 https://huggingface.co/evaluate-metric


Vibe Checking

While the need to track response accuracy is important for detecting model drift for real-time course correction, there's another type of eval that's emerging that's much more...subjective: it's referred to as vibes.


Tech industry insiders are increasingly relying on intuition, rather than hard data, to judge which AI chatbots are best.
— Miles Kruppa, WSJ

In the coming days, I'll be discussing SOTA innovations that help build user confidence in Gen AI apps, including:

  • Retrieval Augmented Generation (RAG)

  • LLM as a Judge

  • Reinforcement Learning with Human Feedback (RLHF)

Previous
Previous

AI Jargon 101

Next
Next

Deepseek AI Made Quite the Splash 🐋