---
title: "Preventing AI agent hallucinations: 7 techniques that work"
canonical: "https://agenticup.dev/posts/preventing-ai-agent-hallucinations/"
pubDate: "2026-06-01T00:00:00.000Z"
description: "I've spent the last year trying to make AI agents tell the truth. Not perfectly — just reliably enough that I don't have to double-check every output. Here are 7 techniques that moved the needle."
tags: [hallucinations, reliability, ai agents, rag, prompt engineering]
---

Lilian Weng's [agent survey](https://lilianweng.github.io/posts/2023-06-23-agent/) (Jun 2023) documents how grounding — connecting LLM outputs to verifiable external data — is the most effective single technique for reducing hallucinations in production agents.

**TL;DR:** You can't eliminate hallucinations entirely, but you can reduce them from 15-25% to under 3% for well-scoped agents. The 7 techniques covered here — grounding with citations, constrained outputs, validation loops, multi-model voting, tool-first design, anti-pattern prompts, and confidence thresholds — each contribute significant reductions.

I spent the last year trying to make AI agents tell the truth. Not perfectly — just reliably enough that I don't have to double-check every output.

Here's the hard truth: you can't eliminate hallucinations entirely. LLMs are next-token predictors trained on internet text. They don't "know" facts. They generate text that looks like facts. Sometimes that text is wrong.

But you can reduce hallucination rates from the default 15-25% down to <3% for well-scoped agents. I've done it. Here's how.

> **Key takeaways:**
> - Grounding in retrieved data with forced source citations is the #1 most effective technique
> - Constrained output formats (JSON schema) prevent the model from making up fields
> - Validation loops catch 50-70% of remaining hallucinations — but cost 2-3x more in LLM calls
> - Multi-model voting works but is expensive — use it only for critical decisions
> - Tool-first design (prefer API calls over LLM generation) eliminates hallucinations entirely for those operations

## Technique 1: Grounding with source citations

The single most effective technique. When the agent must cite specific sources for every factual claim, hallucination rates drop by 60-70%.

```python
from pydantic import BaseModel

class CitedFact(BaseModel):
    claim: str
    source_url: str
    source_excerpt: str
    confidence: float  # 0.0 to 1.0

class GroundedResponse(BaseModel):
    answer: str
    citations: list[CitedFact]
    
SYSTEM_PROMPT = """You are a research agent who ONLY uses information from the provided context documents.

Rules:
1. Every factual claim MUST be accompanied by a citation
2. If the context doesn't contain the information, say "I don't have information about this in the provided sources"
3. Do NOT use your training data to fill gaps — if it's not in the context, don't use it
4. Each citation must include the exact excerpt that supports your claim"""

async def grounded_agent(question: str, context_docs: list[str]) -> GroundedResponse:
    context = "\n\n---\n\n".join(
        f"[SOURCE {i}]: {doc}" 
        for i, doc in enumerate(context_docs)
    )
    
    response = await llm.chat.completions.create(
        model="claude-sonnet-4-20250514",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
        ],
        response_model=GroundedResponse,
    )
    return response
```

**Why it works:** The model knows it will be evaluated on citations. The `response_model` forces it to provide evidence. When the context doesn't contain the answer, the model is more likely to say "I don't know" than to hallucinate — because it can't fabricate a citation.

**Real reduction:** In my code review agent, citations reduced hallucinated bug reports by about 65%. The agent went from "This code has a race condition" (false) to "Lines 45-48 of file X show a potential issue: the lock is released before the async operation completes [Source: file X, lines 45-52]" (verifiable).

**When it fails:** When the context documents themselves contain contradictory information. The model cites one source while ignoring another.

## Technique 2: Constrained output formats

Use structured output (JSON schema) to constrain what the model can generate. This prevents it from making up fields, categories, or values.

```python
from pydantic import BaseModel, Field
from enum import Enum

class Severity(str, Enum):
    CRITICAL = "critical"
    WARNING = "warning"
    INFO = "info"

class ReviewFinding(BaseModel):
    file: str = Field(description="Path to the file with the issue")
    line: int = Field(description="Line number where the issue occurs")
    severity: Severity
    description: str = Field(max_length=500)
    suggested_fix: str | None = Field(default=None)

class CodeReview(BaseModel):
    summary: str = Field(max_length=200)
    findings: list[ReviewFinding] = Field(max_length=10)
    passed: bool
```

**Why it works:** The model can only fill in the fields you define. It can't add "security_score: 85/100" unless you put it in the schema. This prevents the model from inventing evaluation dimensions.

**Real reduction:** About 30% fewer hallucinated metrics. Without constraints, the model would invent things like "Code quality: 7.8/10" or "This file has medium complexity." With constraints, it only reports what you ask for.

**When it fails:** When the schema is too restrictive and the model has genuine insights it can't express. Leave room for optional fields.

## Technique 3: Validation loops

Have the agent verify its own output before returning it. This catches 50-70% of remaining hallucinations.

```python
class ValidatedResponse(BaseModel):
    is_accurate: bool
    issues_found: list[str]
    corrected_output: str | None

async def agent_with_validation(task: str) -> str:
    # First pass: generate response
    initial = await generate_response(task)
    
    # Validation pass: check own work
    validation = await llm.chat.completions.create(
        model="claude-sonnet-4-20250514",
        messages=[{
            "role": "system",
            "content": """You are a fact-checker. Review the assistant's response for:
1. Claims not supported by the provided context
2. Numbers that seem invented
3. Citations to non-existent sources
4. Logical contradictions

Be strict. If you find any issue, reject the response."""
        }, {
            "role": "user",
            "content": f"Context:\n{get_relevant_context(task)}\n\nResponse to verify:\n{initial}"
        }],
        response_model=ValidatedResponse,
    )
    
    # If validation fails, regenerate
    if not validation.is_accurate:
        corrected = await generate_response(
            task + "\n\nNote from fact-checker: " + "\n".join(validation.issues_found)
        )
        return corrected
    
    return initial
```

**Why it works:** Two heads are better than one — even if both heads are the same model. The validation pass looks at the output with a different mindset ("find mistakes" vs "generate answer").

**Real reduction:** In my research agent, validation loops catch about 60% of hallucinations. The corrected output is typically accurate.

**The cost tradeoff:** Each validation pass doubles or triples the LLM cost. For my code review agent, going from single-pass to validated-pass increased cost from $0.80/PR to $2.40/PR. Whether that's worth it depends on the cost of a wrong answer.

**When it fails:** When the validation model has the same blind spot as the generation model. The validator agrees with the hallucination because it looks plausible.

## Technique 4: Multi-model voting

Ask two different models the same question. If they disagree, a third model adjudicates.

```python
async def multi_model_vote(question: str, context: str) -> str:
    models = [
        "claude-sonnet-4-20250514",
        "gpt-4o-2025-05-01",
    ]
    
    # Get answers from different models
    answers = await asyncio.gather(*[
        query_model(model, question, context)
        for model in models
    ])
    
    # If both agree, return the answer
    if semantic_similarity(answers[0], answers[1]) > 0.85:
        return answers[0]
    
    # If they disagree, adjudicate
    adjudication = await llm.chat.completions.create(
        model="claude-opus-4-20250514",
        messages=[{
            "role": "system",
            "content": "Two models gave different answers. "
                       "Review both, check against context, "
                       "and determine which is correct."
        }, {
            "role": "user",
            "content": f"Question: {question}\n\n"
                       f"Context: {context}\n\n"
                       f"Model A answer: {answers[0]}\n"
                       f"Model B answer: {answers[1]}"
        }],
    )
    return adjudication.choices[0].message.content
```

**Why it works:** Different models have different training data, different biases, and different failure modes. A fact that both Claude and GPT-4 agree on is much more likely to be correct than a fact stated by either alone.

**Real reduction:** In testing on 200 factual questions, multi-model voting reduced hallucination rate from 18% (single model) to 4% (two models + adjudication). That's a 78% reduction.

**The cost tradeoff:** This is expensive. Each query costs 2-3x more (two model calls + potential third). For my use cases, I only use this for critical decisions — like whether to approve a deployment or flag a security issue.

**When it fails:** When both models are wrong about the same thing — which happens for recent events (neither model was trained on last week's news) or obscure topics.

## Technique 5: Tool-first design

Prefer tool calls over LLM generation for any operation that can be automated deterministically.

```python
TOOL_DESCRIPTIONS = {
    "calculate": "Use for ALL mathematical operations. Never compute numbers yourself.",
    "lookup": "Use for looking up specific values from databases or APIs.",
    "current_time": "Use for ANY question about dates, times, or durations."
}

async def tool_first_agent(task: str) -> str:
    # The system prompt explicitly discourages LLM computation
    prompt = """You are a conservative agent. For every operation:
1. Ask: "Can I delegate this to a tool?"
2. If yes, use the tool. Do NOT generate the answer yourself.
3. Only use LLM knowledge for: summarization, explanation, creative tasks.

NEVER:
- Calculate numbers yourself (use calculate tool)
- Guess current dates or times (use current_time tool) 
- Look up information from memory (use lookup tool)"""

    result = await llm_with_tools(prompt, task)
    return result
```

**Why it works:** LLMs are terrible at math, bad at dates, and unreliable for factual lookups. Tools are perfect at all three. Every time you route a calculation through a tool instead of the LLM, you eliminate one class of hallucination entirely.

**Real reduction:** About 40% of hallucinations in my agents were in areas that should have been tool calls — calculating totals, looking up exchange rates, formatting dates. Moving these to tools eliminated them completely.

**Example:** My research agent used to hallucinate statistics: "Global AI market was $136 billion in 2025." The real number? $136.6 billion (close) but different from another source that said $142 billion. Now the agent uses the lookup tool to check specific sources. Zero hallucinated numbers since.

**When it fails:** When there's no tool for an operation that needs one — and the agent defaults to guessing. The solution is to identify common failure modes and build tools for them.

## Technique 6: Anti-patterns in system prompts

Some prompt patterns actively cause hallucinations. Here's what to avoid:

### Bad: Over-personalization
```python
# DON'T do this
SYSTEM_PROMPT = "You are a brilliant, confident expert who always knows the answer."
# This encourages the model to answer even when uncertain
```

### Better: Uncertainty-tolerant
```python
# DO this instead
SYSTEM_PROMPT = """You are a careful analyst. It's okay to say "I don't know."
In fact, if you're less than 90% confident, you MUST say "I don't know"
rather than guessing."""
```

### Bad: Leading questions
```python
# DON'T ask this
user = "Can you analyze why the deployment failed? Look at the error logs."
# The model will "find" failure causes even in successful deployments
```

### Better: Neutral framing
```python
# DO ask this
user = "Review the deployment logs and report what you find. If there are no errors, say so."
```

### Bad: Role authority
```python
# DON'T do this
SYSTEM_PROMPT = "You are a world-class security expert with 20 years experience."
# This inflates the model's confidence in areas it's not reliable
```

### Better: Role humility
```python
# DO this instead
SYSTEM_PROMPT = "You are a security reviewer. But you verify every claim against provided sources. You note uncertainty explicitly."
```

**Real reduction:** Changing system prompt patterns alone reduced hallucinations by about 20-25% across my agents. Not the biggest win, but free and easy.

## Technique 7: Confidence thresholds

Set a minimum confidence level. If the agent's confidence is below threshold, return "I don't know" instead of a potentially wrong answer.

```python
class AgentOutput(BaseModel):
    content: str
    confidence: float = Field(ge=0.0, le=1.0)
    knowledge_gaps: list[str] = Field(default_factory=list)

CONFIDENCE_THRESHOLD = 0.7

async def safe_agent(task: str) -> str:
    output = await llm.chat.completions.create(
        model="claude-sonnet-4-20250514",
        messages=[{
            "role": "system",
            "content": """After generating your response, rate your confidence.
- 1.0: I have specific sources that directly answer this
- 0.7-0.9: I have relevant context but some inference was needed
- 0.4-0.6: I'm reasonably confident based on general knowledge
- 0.0-0.3: I'm guessing or unsure

If confidence is below 0.7, list your knowledge gaps clearly."""
        }, {
            "role": "user", "content": task
        }],
        response_model=AgentOutput,
    )
    
    if output.confidence < CONFIDENCE_THRESHOLD:
        return (
            f"I'm not confident enough to answer this.\n"
            f"Knowledge gaps: {', '.join(output.knowledge_gaps)}\n\n"
            f"Please provide more specific context or refine the question."
        )
    
    return output.content
```

**Why it works:** The model is surprisingly good at estimating its own certainty — when you ask it to. Without a confidence check, it will confidently state wrong answers. With the check, it defers when uncertain.

**Real reduction:** About 50% of remaining hallucinations are caught by confidence thresholds. The agent goes from "The API rate limit is 100 requests per minute" (wrong, it's 60) to "I don't have information about the specific rate limit in the provided documentation."

**When it fails:** The model can be overconfident (thinks it knows but doesn't) or underconfident (knows but doubts itself). Calibration varies by model and domain.

## Putting it all together

For a production agent, here's my recommended stack:

| Tier | Techniques | Cost Multiplier | Target Hallucination Rate | Best For |
|---|---|---|---|---|
| Basic | Grounding + constraints | 1x | 5-10% | Content generation, drafts |
| Standard | Above + validation loops | 2-3x | 2-5% | Code review, research |
| Enterprise | Above + multi-model + confidence thresholds | 3-5x | <1% | Medical, financial, compliance |

My code review agent runs at the Standard tier. My personal research assistant runs at Basic. If I were building a medical diagnosis agent, it would run at Enterprise.

## The bottom line

Hallucinations aren't going away. LLMs are probabilistic — they can always be wrong. But with the right techniques, you can make them reliable enough for most production use cases.

The key insight: **design for hallucination, not against it.** Assume the agent will make things up. Build validation into the loop. Make "I don't know" the default response. Trust but verify — especially when the agent sounds confident.

---

*Related: [AI agent context window management](/posts/ai-agent-context-window-management/) — keeping your agent from forgetting. Also see [AI agent error handling patterns](/posts/ai-agent-error-handling-patterns/) for production reliability.*

*Related: [What is an AI agent? A complete beginner's guide for developers](/posts/what-is-an-ai-agent-beginners-guide/) — a beginner-friendly explanation of what AI agents are and how they work.*
