Airbnb Automation Platform v2 - LLM-Powered Conversational AI
Hybrid platform combining Chain of Thought LLM reasoning with traditional workflows-as-tools serving millions - flexible natural dialogue for nuanced queries while maintaining deterministic workflows for sensitive data/strict validations
Airbnb Automation Platform v2 - LLM-Powered Conversational AI
Source: Airbnb Engineering Blog - Automation Platform v2: Improving Conversational AI at Airbnb Authors: Chutian Wang, Zhiheng Xu, Paul Lou, Ziyi Wang, Jiayu Lou, Liuming Zhang, Jingwen Qiang, Clint Kelly, Lei Shi, Dan Zhao, Xu Hu, Jianqi Liao, Zecheng Xu, Tong Chen Date: October 28, 2024 Category: Productivity & Automation
Problem
Airbnb’s Automation Platform v1 supported conversational AI products through predefined step-by-step workflows designed by product engineering and business teams. While functional, this approach faced significant limitations.
Challenges of Traditional Conversational AI Systems (v1):
- Not flexible enough: AI products followed predefined (usually rigid) processes that couldn’t adapt to nuanced user needs
- Hard to scale: Product creators needed to manually create workflows and tasks for every scenario, repeating the entire process for any new use case
- Time-consuming and error-prone: Manual workflow creation was labor-intensive and prone to mistakes
- Limited natural dialogue: Predefined workflows couldn’t handle open-ended questions or capture nuanced information from ongoing conversations
The LLM Opportunity: Early experiments showed that LLM-powered conversation could provide a more natural and intelligent conversational experience than human-designed workflows. Customers could engage in natural dialogue, ask open-ended questions, and explain issues in detail, with LLMs accurately interpreting queries and capturing nuanced information.
The Challenge: LLM-powered applications were still relatively new, with the community working to improve aspects like latency and hallucination. Too early to fully rely on them for large-scale diverse experiences serving millions of customers. For instance, sensitive data and strict validations (like claim processing) were better suited for traditional workflows.
Solution
Airbnb developed Automation Platform v2, a conversational AI platform designed specifically for emerging LLM applications that combines the benefits of both LLM-powered conversation and traditional workflows.
Hybrid Approach: Best of Both Worlds
The team’s strategy: combine LLM-powered conversation with traditional workflows to leverage the benefits of both approaches:
- Traditional workflows: Better for sensitive data, strict validations, and deterministic processes
- LLM-powered workflows: Better for natural dialogue, nuanced understanding, and adaptive reasoning
Architecture Overview
When a user inquiry arrives (e.g., “where is my next reservation?”):
- Platform collects relevant contextual information (previous chat history, user id, user role, etc.)
- Platform loads and assembles prompt using inquiry and context, sends to LLM
- LLM response requests tool execution (e.g., service call to fetch most recent reservation)
- Platform follows order, executes actual service call, saves responses into current context
- Platform sends updated context to LLM
- LLM generates complete response (e.g., describing location of next reservation)
- Platform returns LLM response and records conversation for future reference
Key Components
1. Chain of Thought Workflow
Implementation of the Chain of Thought AI agent framework that enables LLMs to reason about issues.
Core Idea: Use LLM as reasoning engine to determine which tools to use and in which order. Tools are the way an LLM interacts with the world to solve real problems (e.g., checking reservation status, checking listing availability).
Main Steps:
- Prepare context: Prompt, contextual data, and historical conversations
- Trigger reasoning loop:
- Ask LLM for reasoning
- Execute LLM-requested tool
- Process tool’s outcome
- Loop until result: Chain of Thought stays in reasoning loop until a result is generated
Components:
- CoT IO Handler: Assemble prompt, prepare contextual data, collect user input, general data processing before sending to LLM
- Tool Manager: Prepare tool payload with LLM input & output, manage tool execution, offer quality of life features like retry or rate limiting
- LLM Adapter: Allow developers to add customized logic facilitating integration with different types of LLMs
Key Innovation: Tools are essentially actions and workflows from v1, which work well as tools in Chain of Thought because of their unified interface and managed execution environment.
2. Context Management
To ensure the LLM makes the best decision, the system provides all necessary and relevant information:
- Historical interactions with the LLM
- Intent of the customer support inquiry
- Current trip information
- For offline evaluation: point-in-time data retrieval
Developer Flexibility:
- Statically declare needed context (e.g., customer name)
- Name a dynamic context retriever (e.g., relevant help articles for customer’s questions)
Components:
- Context Loader: Connect to different sources and fetch relevant context based on developers’ customizable fetching logic
- Runtime Context Manager: Maintain runtime context, process context for each LLM call, interact with context storage
3. Guardrails Framework
A safe-guarding mechanism that monitors communications with the LLM, ensuring responses are helpful, relevant, and ethical.
Addresses LLM Issues:
- Hallucinations: LLM generating false or nonsensical information
- Jailbreaks: Attempts to bypass LLM safety restrictions
Architecture:
- Engineers from different teams create reusable guardrails
- During runtime, guardrails executed in parallel
- Leverage different downstream tech stacks
Examples:
- Content Moderation Guardrail: Calls various LLMs to detect violations in communication content
- Tool Guardrails: Use rules to prevent bad execution (e.g., updating listings with invalid setup)
4. Additional Developer Support Components
While not detailed in the post, the platform also includes:
- Playground feature: Bridges gap between development and production tech stacks, allowing prompt writers to freely iterate on prompts
- LLM-oriented observability: Detailed insights into each LLM interaction (latency, token usage)
- Enhanced Tool management: Responsible for tool registration, publishing process, execution, and observability
Impact
While specific metrics weren’t provided, the platform enables:
- More efficient customer support: Agents work more efficiently with LLM assistance
- Better resolutions: More accurate understanding of customer needs
- Quicker responses: Automated reasoning and tool execution
- Production deployment: Real system serving millions of Airbnb customers
- Developer productivity: Seamless integration between system and developer tools
Key Insights
-
Hybrid Approach Works Best: At this stage, combining LLM-powered conversation with traditional workflows leverages the benefits of both approaches. Don’t fully replace traditional systems yet.
-
Use Traditional Workflows for Sensitive Data: Processes requiring strict validations, sensitive data handling, or deterministic outcomes are better suited for traditional workflows.
-
Tools as Bridge Between Old and New: Reusing actions and workflows from v1 as tools in Chain of Thought provides unified interface and managed execution environment.
-
Context is Critical: Providing all necessary contextual information (history, intent, trip info, etc.) is essential for LLM decision-making quality.
-
Guardrails are Non-Negotiable: Production LLM applications must have robust guardrails to address hallucinations, jailbreaks, and ensure helpful, relevant, ethical responses.
-
Developer Experience Matters: Providing playground, observability, and tool management significantly improves developer productivity when building LLM applications.
-
Modular Architecture Enables Flexibility: Separating concerns (IO handling, tool management, LLM adapter, context management, guardrails) allows customization and scalability.
What’s Next
The team plans to continue evolving with transformative technologies:
- Explore other AI agent frameworks: Beyond Chain of Thought
- Expand Chain of Thought tool capabilities: Add more tools and improve existing ones
- Investigate LLM application simulation: Testing and validation approaches
- Anticipate further gains: Efficiency and productivity improvements for all AI practitioners at Airbnb
Technical Stack
- LLMs: Large language models (specific models not disclosed)
- Chain of Thought: AI agent framework for reasoning
- Automation Platform v1: Actions and workflows repurposed as tools
- Context Management System: Custom-built for context loading and runtime management
- Guardrails Framework: Multi-LLM content moderation and rule-based validation
- Developer Tools: Playground, observability, tool management
Architecture Comparison
Traditional Workflows (v1)
- Predefined step-by-step processes
- Static, rigid structure
- Manual creation for each scenario
- Time-consuming to scale
- Good for: Sensitive data, strict validations, deterministic processes
AI-Driven Workflows (v2)
- LLM reasoning to determine tool usage and order
- Dynamic, adaptive structure
- Automatic reasoning and tool selection
- Scales through additional tools
- Good for: Natural dialogue, nuanced understanding, open-ended questions
Hybrid Approach (v2 Strategy)
- Combines both traditional and AI-driven workflows
- Leverages strengths of each approach
- Traditional workflows called as tools by LLM
- Production-ready balance of flexibility and reliability
Tags: Conversational AI, LLM, Chain of Thought, Customer Support, Guardrails, Context Management, Tool Management, Hybrid Architecture, Production System, Developer Platform
Related Examples: