Agent Insights - Awesome Agent Examples

Agent Implementation Insights

Distilled insights from 52 real-world implementations

Instead of showcasing time saved or ROI metrics, these insights focus on what actually matters: problem patterns you can recognize, architecture decisions with rationale, breakthrough insights that made implementations successful, and anti-patterns to avoid.

Problem Patterns

Breakthrough Insights

Common Constraints

Anti-Patterns

Problem Patterns

Does your problem look like any of these? Click to explore examples.

Content Generation

Duolingo

Explore examples →

Structured Data Extraction

Delivery Hero Quick Commerce

Explore examples →

Recommendation System

eBay

Explore examples →

Knowledge Retrieval

Grab, Dropbox, Harvard Business School +5 more

Explore examples →

Text To Sql

Delivery Hero (Woowa Brothers), Pinterest, Swiggy +1 more

Explore examples →

Text To Query

Honeycomb

Explore examples →

Quality Evaluation

Explore examples →

Workflow Automation

Salesforce, Grab, Cisco Outshift +12 more

Explore examples →

Data Annotation

Spotify

Explore examples →

Data Classification

Walmart, Grab

Explore examples →

Content Discovery

Adobe, Salesforce, Bertelsmann +1 more

Explore examples →

Code Generation

Adyen

Explore examples →

Framework Platform Building

Jeppesen (Boeing), Manus, Airtable

Explore examples →

Root Cause Analysis

Document Processing At Scale

LlamaIndex, SoftIQ

Explore examples →

Customer Support

DoorDash, Vimeo

Explore examples →

Developer Productivity

Instacart

Explore examples →

Fraud Detection

Whatnot

Explore examples →

Multi Source Research

Anthropic, Exa

Explore examples →

Architecture Patterns

How teams structured their agents—and why those choices mattered.

Pipeline

2 implementations

Duolingo Meta

Single Agent

28 implementations

Delivery Hero Quick Commerce Honeycomb LinkedIn

+25 more

Multi Agent

14 implementations

eBay Delivery Hero (Woowa Brothers) Uber

+11 more

Multi Stage System

1 implementation

Grab

Hybrid

5 implementations

Spotify Jeppesen (Boeing) Airbnb

+2 more

Ensemble

1 implementation

Walmart

Event Driven

1 implementation

Airtable

Top Breakthrough Insights

The critical decisions that made implementations successful. Not time saved—what actually worked.

Duolingo

Patterns over prompts - feeding AI existing curriculum content as examples dramatically outperformed adding more constraints to prompt instructions; generate-many-filter-to-best approach (create multiple episodes, AI evaluators select highest quality) more effective than trying to generate perfect content on first try; curriculum foundation ensures level-appropriate, grammatically sound content

Delivery Hero Quick Commerce

Knowledge distillation dramatically reduced costs while maintaining quality - teacher model (GPT-4o) trained smaller student (GPT-4o-mini) to achieve same quality with much shorter prompts

eBay

Agent abstraction decoupling input/output from implementation allows multiple variations honoring same contract but differing in complexity/models/optimization - maintaining strong inter-agent compatibility and reusability for rapid development; if solution works in developer's local environment for single instance, platform automatically scales to eBay's industrial needs serving hundreds of millions across billions of listings; NRT distributed queue-based messaging smooths user activity peaks/valleys for consistent controllable throughput and better GPU utilization

Grab

Documentation is foundation for LLM discovery - without high-quality docs (increased from 20%→90% coverage), LLM chatbot cannot work effectively; four distinct search categories (exact, partial, inexact, semantic) require different solutions - keyword search handles 75%, LLM handles semantic; incremental approach (Elasticsearch→Docs→LLM) validated each step before next; leveraging existing Glean tool accelerated go-to-market vs building from scratch

Delivery Hero (Woowa Brothers)

Enriching table metadata with business terminology, few-shot examples from domain experts, and multi-stage retrieval algorithms transforms generic GPT-4 into domain-expert Text-to-SQL capable of production use - simply using GPT-4 alone produces queries lacking company context, ignoring data policies, and suffering hallucinations, but combination of (1) augmented documentation with detailed column descriptions and business glossaries, (2) sophisticated search algorithms refining questions and selecting relevant examples, and (3) ReAct prompting reasoning step-by-step while dynamically retrieving context enables query quality that employees trust for actual work; 'garbage in, garbage out' principle applies where foundation model performance capped by input quality

Honeycomb

Single-pass generation essential to avoid accuracy degradation - 90% accuracy compounds to 59% over 5 chains, so chaining multiple LLM calls fails; few-shot prompting outperformed zero-shot and chain-of-thought; accepting 'good enough' outputs (flexible interpretation like user typing 'slow') serves users better than rigid correctness; LLMs are engines for features not standalone products; schema filtering by 7-day activity handles context limits for 5,000+ field customers

Automating quality evaluation with LLMs enables continuous, rapid iteration on search improvements that would be impossible with manual review - by reducing evaluation time from days/weeks to hours, LinkedIn can experiment with search enhancements and measure impact quickly, dramatically accelerating search quality improvements; slow feedback loops prevent experimentation and innovation, but GPT-powered evaluation provides consistent, fast assessment enabling continuous improvement at platform scale serving billions

Table documentation quality trumps model sophistication - weighting embeddings toward table metadata increased search hit rate from 40% to 90%, proving data governance is the primary bottleneck for Text-to-SQL performance; real-world deployment revealed table discovery (finding right tables among hundreds of thousands) and metadata quality (accurate descriptions of table purpose and column meanings) far more critical than prompt engineering or model selection; benchmarks like Spider fail to capture this as they treat small number of pre-specified well-normalized tables as given

Showing top 8 of 52 insights · Browse all examples

Most Common Constraints

Real-world constraints that shaped how teams built their agents.

Cost Efficiency Required 2

Production Quality Requirements 2

Hundreds Thousands Tables 2

Metadata Quality Dependency 2

Heterogeneous Systems 2

Production Scale 2

Manual Production Unsustainable 1

Multi Language Scale Required 1

Common Anti-Patterns

What teams tried that didn't work. Learn from these failures to avoid repeating them.

✗

complex-prompt-instructions

Adding more English-only constraints to prompts for original content generation produced subpar results requiring extensive manual editing - feeding curriculum content as patterns worked better

— Duolingo

✗

perfect-first-try-generation

Trying to make AI generate perfect content immediately failed - generate-many-filter-to-best approach with comprehensive evaluators more effective

— Duolingo

✗

unstructured-exercise-placement

Giving AI freedom to sequence exercises resulted in hit-or-miss quality - standardizing exercise order using learner session data improved reliability

— Duolingo

✗

translation-automation-only

Automated translations frequently missed accuracy and proficiency level requirements - curriculum-driven generation better aligned with learning goals

— Duolingo

✗

dynamically-orchestrated-agents

too much flexibility for structured task requiring predictability

— Delivery Hero Quick Commerce

✗

lengthy-prompts-with-examples

knowledge distillation with fine-tuning enabled shorter, cheaper prompts

— Delivery Hero Quick Commerce

Showing 6 common failures · See all examples

Popular Frameworks

Most-used frameworks in these implementations. Note: Popularity ≠ right for your problem.

custom

8 cases

LangGraph

5 cases

Custom

5 cases

LangChain

3 cases

LlamaIndex

3 cases

Is your problem a good fit for agents?

Compare your requirements to these proven patterns. If you see similar problems, constraints, and complexity, you've found a validated starting point.

Browse All Examples →