# Go AI Client Library - Complete Documentation

## Table of Contents

- [Home](#home)
- [Getting Started > Installation](#getting-started-installation)
- [Getting Started > Quick Start](#getting-started-quick-start)
- [Providers > Overview](#providers-overview)
- [Providers > LLM](#providers-llm)
- [Providers > Embeddings](#providers-embeddings)
- [Providers > Image Generation](#providers-image-generation)
- [Providers > Audio](#providers-audio)
- [Providers > Speech-to-Text](#providers-speech-to-text)
- [Providers > Rerankers](#providers-rerankers)
- [Providers > Fill-in-the-Middle](#providers-fill-in-the-middle)
- [Providers > Vision](#providers-vision)
- [Agent Framework > Overview](#agent-framework-overview)
- [Agent Framework > Session Management](#agent-framework-session-management)
- [Agent Framework > Persistent Memory](#agent-framework-persistent-memory)
- [Agent Framework > Streaming](#agent-framework-streaming)
- [Agent Framework > Hooks](#agent-framework-hooks)
- [Agent Framework > Tool Confirmation](#agent-framework-tool-confirmation)
- [Agent Framework > Sub-Agents](#agent-framework-sub-agents)
- [Agent Framework > Background Agents](#agent-framework-background-agents)
- [Agent Framework > Handoffs](#agent-framework-handoffs)
- [Agent Framework > Fan-Out](#agent-framework-fan-out)
- [Agent Framework > Continue/Resume](#agent-framework-continueresume)
- [Agent Framework > Context Strategies](#agent-framework-context-strategies)
- [Agent Framework > Toolsets](#agent-framework-toolsets)
- [Agent Framework > Instruction Templates](#agent-framework-instruction-templates)
- [Integrations > PostgreSQL](#integrations-postgresql)
- [Integrations > SQLite](#integrations-sqlite)
- [Integrations > pgvector](#integrations-pgvector)
- [Advanced > Batch Processing](#advanced-batch-processing)
- [Advanced > BYOM](#advanced-byom)
- [Advanced > MCP Integration](#advanced-mcp-integration)
- [Advanced > Tool Calling](#advanced-tool-calling)
- [Advanced > Structured Output](#advanced-structured-output)
- [Advanced > Cost Tracking](#advanced-cost-tracking)
- [Advanced > Prompt Templates](#advanced-prompt-templates)
- [Advanced > OpenTelemetry Tracing](#advanced-opentelemetry-tracing)
- [Advanced > Configuration](#advanced-configuration)

---

## Home

> Source: index.md

# Go AI Client Library

[![Go Reference](https://pkg.go.dev/badge/github.com/joakimcarlsson/ai.svg)](https://pkg.go.dev/github.com/joakimcarlsson/ai)
[![Go Report Card](https://goreportcard.com/badge/github.com/joakimcarlsson/ai)](https://goreportcard.com/report/github.com/joakimcarlsson/ai)

A comprehensive, multi-provider Go library for interacting with various AI models through unified interfaces. This library supports Large Language Models (LLMs), embedding models, image generation models, audio generation (text-to-speech), and rerankers from multiple providers including Anthropic, OpenAI, Google, AWS, Voyage AI, xAI, ElevenLabs, and more.

## Features

- **Multi-Provider Support** — Unified interface for 10+ AI providers
- **LLM Support** — Chat completions, streaming, tool calling, structured output
- **Agent Framework** — Multi-agent orchestration with sub-agents, handoffs, fan-out, session management, persistent memory, and context strategies
- **Embedding Models** — Text, multimodal, and contextualized embeddings
- **Image Generation** — Text-to-image generation with multiple quality and size options
- **Audio Generation** — Text-to-speech with voice selection and streaming support
- **Speech-to-Text** — Audio transcription and translation with timestamp support
- **Rerankers** — Document reranking for improved search relevance
- **Streaming Responses** — Real-time response streaming via Go channels
- **Tool Calling** — Native function calling with struct-tag schema generation
- **Structured Output** — Constrained generation with JSON schemas
- **MCP Integration** — Model Context Protocol support for advanced tooling
- **Multimodal Support** — Text and image inputs across compatible providers
- **Cost Tracking** — Built-in token and character usage with cost calculation
- **Retry Logic** — Exponential backoff with configurable retry policies
- **Type Safety** — Full Go generics support for compile-time safety

## Quick Install

```bash
go get github.com/joakimcarlsson/ai
```

## Quick Example

```go
package main

import (
    "context"
    "fmt"
    "log"

    "github.com/joakimcarlsson/ai/message"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)

func main() {
    ctx := context.Background()

    client, err := llm.NewLLM(
        model.ProviderOpenAI,
        llm.WithAPIKey("your-api-key"),
        llm.WithModel(model.OpenAIModels[model.GPT4o]),
        llm.WithMaxTokens(1000),
    )
    if err != nil {
        log.Fatal(err)
    }

    messages := []message.Message{
        message.NewUserMessage("Hello, how are you?"),
    }

    response, err := client.SendMessages(ctx, messages, nil)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(response.Content)
}
```

## Next Steps

- [Installation & Quick Start](getting-started/installation.md) — Get up and running
- [Provider Overview](providers/overview.md) — See all supported providers
- [Agent Framework](agent/overview.md) — Build multi-agent systems
- [Advanced Features](advanced/byom.md) — BYOM, MCP, cost tracking


---

## Getting Started > Installation

> Source: getting-started/installation.md

# Installation

## Requirements

- Go 1.25 or later

## Install

```bash
go get github.com/joakimcarlsson/ai
```

## Import

```go
import (
    "github.com/joakimcarlsson/ai/message"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)
```

## Provider API Keys

Each provider requires its own API key. Set them as environment variables:

```bash
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
export VOYAGE_API_KEY="..."
export XAI_API_KEY="..."
export ELEVENLABS_API_KEY="..."
```

Or pass them directly when creating a client:

```go
client, err := llm.NewLLM(
    model.ProviderOpenAI,
    llm.WithAPIKey("your-api-key"),
    llm.WithModel(model.OpenAIModels[model.GPT4o]),
)
```


---

## Getting Started > Quick Start

> Source: getting-started/quick-start.md

# Quick Start

## Basic Usage

```go
package main

import (
    "context"
    "fmt"
    "log"

    "github.com/joakimcarlsson/ai/message"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)

func main() {
    ctx := context.Background()

    client, err := llm.NewLLM(
        model.ProviderOpenAI,
        llm.WithAPIKey("your-api-key"),
        llm.WithModel(model.OpenAIModels[model.GPT4o]),
        llm.WithMaxTokens(1000),
    )
    if err != nil {
        log.Fatal(err)
    }

    messages := []message.Message{
        message.NewUserMessage("Hello, how are you?"),
    }

    response, err := client.SendMessages(ctx, messages, nil)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(response.Content)
}
```

## Streaming Responses

```go
stream := client.StreamResponse(ctx, messages, nil)

for event := range stream {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Printf("\nTokens used: %d\n", event.Response.Usage.InputTokens)
    case types.EventError:
        log.Fatal(event.Error)
    }
}
```

## Multimodal (Images)

```go
imageData, err := os.ReadFile("image.png")
if err != nil {
    log.Fatal(err)
}

msg := message.NewUserMessage("What's in this image?")
msg.AddAttachment(message.Attachment{
    MIMEType: "image/png",
    Data:     imageData,
})

messages := []message.Message{msg}
response, err := client.SendMessages(ctx, messages, nil)
```

## Your First Agent

```go
import (
    "github.com/joakimcarlsson/ai/agent"
    "github.com/joakimcarlsson/ai/agent/session"
)

myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are a helpful assistant."),
    agent.WithTools(&weatherTool{}),
    agent.WithSession("user-123", session.FileStore("./sessions")),
)

response, _ := myAgent.Chat(ctx, "What's the weather in Tokyo?")
fmt.Println(response.Content)
```

See the [Agent Framework](../agent/overview.md) section for the full guide.


---

## Providers > Overview

> Source: providers/overview.md

# Supported Providers

## LLM Providers

| Provider | Streaming | Tools | Structured Output | Attachments |
|----------|-----------|-------|-------------------|-------------|
| Anthropic (Claude) | ✅ | ✅ | ❌ | ✅ |
| OpenAI (GPT) | ✅ | ✅ | ✅ | ✅ |
| Google Gemini | ✅ | ✅ | ✅ | ✅ |
| AWS Bedrock | ✅ | ✅ | ❌ | ✅ |
| Azure OpenAI | ✅ | ✅ | ✅ | ✅ |
| Google Vertex AI | ✅ | ✅ | ✅ | ✅ |
| Groq | ✅ | ✅ | ✅ | ✅ |
| OpenRouter | ✅ | ✅ | ✅ | ✅ |
| xAI (Grok) | ✅ | ✅ | ✅ | ✅ |

## Embedding & Reranker Providers

| Provider | Text Embeddings | Multimodal Embeddings | Contextualized Embeddings | Rerankers |
|----------|-----------------|----------------------|---------------------------|-----------|
| Voyage AI | ✅ | ✅ | ✅ | ✅ |
| OpenAI | ✅ | ❌ | ❌ | ❌ |

## Image Generation Providers

| Provider | Models | Quality Options | Size Options |
|----------|--------|-----------------|--------------|
| OpenAI | DALL-E 2, DALL-E 3, GPT Image 1 | standard, hd, low, medium, high | 256x256 to 1792x1024 |
| xAI (Grok) | Grok 2 Image | default | default |
| Google Gemini | Gemini 2.5 Flash Image, Imagen 3, Imagen 4, Imagen 4 Ultra, Imagen 4 Fast | default | Aspect ratios: 1:1, 3:4, 4:3, 9:16, 16:9 |

## Audio Generation Providers (Text-to-Speech)

| Provider | Models | Streaming | Voice Selection | Max Characters |
|----------|--------|-----------|-----------------|----------------|
| ElevenLabs | Multilingual v2, Turbo v2.5, Flash v2.5 | ✅ | ✅ | 10,000 - 40,000 |

## Speech-to-Text Providers (Transcription)

| Provider | Models | Streaming | Translation | Timestamps | Diarization |
|----------|--------|-----------|-------------|------------|-------------|
| OpenAI | Whisper-1, GPT-4o Transcribe, GPT-4o Mini Transcribe | ✅ | ✅ | ✅ | ✅ |


---

## Providers > LLM

> Source: providers/llm.md

# LLM Providers

## Creating a Client

```go
import (
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)

client, err := llm.NewLLM(
    model.ProviderOpenAI,
    llm.WithAPIKey("your-api-key"),
    llm.WithModel(model.OpenAIModels[model.GPT4o]),
    llm.WithMaxTokens(1000),
)
```

## Sending Messages

```go
messages := []message.Message{
    message.NewUserMessage("Hello, how are you?"),
}

response, err := client.SendMessages(ctx, messages, nil)
fmt.Println(response.Content)
```

## Streaming

```go
stream := client.StreamResponse(ctx, messages, nil)

for event := range stream {
    switch event.Type {
    case types.EventTypeContentDelta:
        fmt.Print(event.Content)
    case types.EventTypeFinal:
        fmt.Printf("\nTokens used: %d\n", event.Response.Usage.InputTokens)
    case types.EventTypeError:
        log.Fatal(event.Error)
    }
}
```

## Multimodal (Images)

```go
imageData, err := os.ReadFile("image.png")
if err != nil {
    log.Fatal(err)
}

msg := message.NewUserMessage("What's in this image?")
msg.AddAttachment(message.Attachment{
    MIMEType: "image/png",
    Data:     imageData,
})

messages := []message.Message{msg}
response, err := client.SendMessages(ctx, messages, nil)
```

## Client Options

```go
client, err := llm.NewLLM(
    model.ProviderOpenAI,
    llm.WithAPIKey("your-key"),
    llm.WithModel(model.OpenAIModels[model.GPT4o]),
    llm.WithMaxTokens(2000),
    llm.WithTemperature(0.7),
    llm.WithTopP(0.9),
    llm.WithTimeout(30*time.Second),
    llm.WithStopSequences("STOP", "END"),
)
```

## Provider-Specific Options

```go
// Anthropic
llm.WithAnthropicOptions(
    llm.WithAnthropicBeta("beta-feature"),
)

// OpenAI
llm.WithOpenAIOptions(
    llm.WithOpenAIBaseURL("custom-endpoint"),
    llm.WithOpenAIExtraHeaders(map[string]string{
        "Custom-Header": "value",
    }),
)
```


---

## Providers > Embeddings

> Source: providers/embeddings.md

# Embeddings

## Text Embeddings

```go
import (
    "github.com/joakimcarlsson/ai/embeddings"
    "github.com/joakimcarlsson/ai/model"
)

embedder, err := embeddings.NewEmbedding(model.ProviderVoyage,
    embeddings.WithAPIKey(""),
    embeddings.WithModel(model.VoyageEmbeddingModels[model.Voyage35]),
)
if err != nil {
    log.Fatal(err)
}

texts := []string{
    "Hello, world!",
    "This is a test document.",
}

response, err := embedder.GenerateEmbeddings(context.Background(), texts)
if err != nil {
    log.Fatal(err)
}

for i, embedding := range response.Embeddings {
    fmt.Printf("Text: %s\n", texts[i])
    fmt.Printf("Dimensions: %d\n", len(embedding))
    fmt.Printf("First 5 values: %v\n", embedding[:5])
}
```

## Multimodal Embeddings

```go
embedder, err := embeddings.NewEmbedding(model.ProviderVoyage,
    embeddings.WithAPIKey(""),
    embeddings.WithModel(model.VoyageEmbeddingModels[model.VoyageMulti3]),
)

multimodalInputs := []embeddings.MultimodalInput{
    {
        Content: []embeddings.MultimodalContent{
            {Type: "text", Text: "This is a banana."},
            {Type: "image_url", ImageURL: "https://example.com/banana.jpg"},
        },
    },
}

response, err := embedder.GenerateMultimodalEmbeddings(context.Background(), multimodalInputs)
```

## Contextualized Embeddings

Embed document chunks with awareness of their surrounding context. Each chunk embedding incorporates information from the full document, improving retrieval for chunks that lack standalone meaning.

```go
documentChunks := [][]string{
    { // Document 1
        "Introduction to quantum computing...",
        "Qubits differ from classical bits...",
        "Quantum entanglement enables...",
    },
    { // Document 2
        "Machine learning overview...",
        "Neural networks consist of...",
    },
}

response, err := embedder.GenerateContextualizedEmbeddings(context.Background(), documentChunks)

// response.DocumentEmbeddings[0][1] = embedding for "Qubits differ..." with context from Document 1
```

## Client Options

```go
embedder, err := embeddings.NewEmbedding(
    model.ProviderVoyage,
    embeddings.WithAPIKey(""),
    embeddings.WithModel(model.VoyageEmbeddingModels[model.Voyage35]),
    embeddings.WithBatchSize(100),
    embeddings.WithDimensions(1024),
    embeddings.WithTimeout(30*time.Second),
    embeddings.WithVoyageOptions(
        embeddings.WithInputType("document"),
        embeddings.WithOutputDimension(1024),
        embeddings.WithOutputDtype("float"),
    ),
)
```

## Embedding Interface

```go
type Embedding interface {
    GenerateEmbeddings(ctx, texts, inputType...) (*EmbeddingResponse, error)
    GenerateMultimodalEmbeddings(ctx, inputs, inputType...) (*EmbeddingResponse, error)
    GenerateContextualizedEmbeddings(ctx, documentChunks, inputType...) (*ContextualizedEmbeddingResponse, error)
    Model() model.EmbeddingModel
}
```


---

## Providers > Image Generation

> Source: providers/image-generation.md

# Image Generation

## OpenAI DALL-E 3

```go
import (
    "github.com/joakimcarlsson/ai/image_generation"
    "github.com/joakimcarlsson/ai/model"
)

client, err := image_generation.NewImageGeneration(
    model.ProviderOpenAI,
    image_generation.WithAPIKey("your-api-key"),
    image_generation.WithModel(model.OpenAIImageGenerationModels[model.DALLE3]),
)
if err != nil {
    log.Fatal(err)
}

response, err := client.GenerateImage(
    context.Background(),
    "A serene mountain landscape at sunset with vibrant colors",
    image_generation.WithSize("1024x1024"),
    image_generation.WithQuality("hd"),
    image_generation.WithResponseFormat("b64_json"),
)
if err != nil {
    log.Fatal(err)
}

imageData, _ := image_generation.DecodeBase64Image(response.Images[0].ImageBase64)
os.WriteFile("image.png", imageData, 0644)
```

## Google Gemini Imagen 4

```go
client, err := image_generation.NewImageGeneration(
    model.ProviderGemini,
    image_generation.WithAPIKey("your-api-key"),
    image_generation.WithModel(model.GeminiImageGenerationModels[model.Imagen4]),
)

response, err := client.GenerateImage(
    context.Background(),
    "A futuristic cityscape at night",
    image_generation.WithSize("16:9"),
    image_generation.WithN(4),
)
```

## xAI Grok 2 Image

```go
client, err := image_generation.NewImageGeneration(
    model.ProviderXAI,
    image_generation.WithAPIKey("your-api-key"),
    image_generation.WithModel(model.XAIImageGenerationModels[model.XAIGrok2Image]),
)

response, err := client.GenerateImage(
    context.Background(),
    "A robot playing chess",
    image_generation.WithResponseFormat("b64_json"),
)
```

## Client Options

```go
// OpenAI/xAI
client, err := image_generation.NewImageGeneration(
    model.ProviderOpenAI,
    image_generation.WithAPIKey("your-key"),
    image_generation.WithModel(model.OpenAIImageGenerationModels[model.DALLE3]),
    image_generation.WithTimeout(60*time.Second),
    image_generation.WithOpenAIOptions(
        image_generation.WithOpenAIBaseURL("custom-endpoint"),
    ),
)

// Gemini
client, err := image_generation.NewImageGeneration(
    model.ProviderGemini,
    image_generation.WithAPIKey("your-key"),
    image_generation.WithModel(model.GeminiImageGenerationModels[model.Imagen4]),
    image_generation.WithTimeout(60*time.Second),
    image_generation.WithGeminiOptions(
        image_generation.WithGeminiBackend(genai.BackendVertexAI),
    ),
)
```


---

## Providers > Audio

> Source: providers/audio.md

# Audio Generation (Text-to-Speech)

## Basic Usage

```go
import (
    "github.com/joakimcarlsson/ai/audio"
    "github.com/joakimcarlsson/ai/model"
)

client, err := audio.NewAudioGeneration(
    model.ProviderElevenLabs,
    audio.WithAPIKey("your-api-key"),
    audio.WithModel(model.ElevenLabsAudioModels[model.ElevenTurboV2_5]),
)
if err != nil {
    log.Fatal(err)
}

response, err := client.GenerateAudio(
    context.Background(),
    "Hello! This is a demonstration of text-to-speech.",
    audio.WithVoiceID("EXAVITQu4vr4xnSDxMaL"),
)
if err != nil {
    log.Fatal(err)
}

os.WriteFile("output.mp3", response.AudioData, 0644)
fmt.Printf("Characters used: %d\n", response.Usage.Characters)
```

## Custom Voice Settings

```go
response, err := client.GenerateAudio(
    context.Background(),
    "This uses custom voice settings for enhanced expressiveness.",
    audio.WithVoiceID("EXAVITQu4vr4xnSDxMaL"),
    audio.WithStability(0.75),              // 0.0-1.0, higher = more consistent
    audio.WithSimilarityBoost(0.85),        // 0.0-1.0, higher = more similar to original
    audio.WithStyle(0.5),                   // 0.0-1.0, higher = more expressive
    audio.WithSpeakerBoost(true),           // Enhanced speaker similarity
)
```

## Streaming Audio

```go
chunkChan, err := client.StreamAudio(
    context.Background(),
    "This is a streaming audio example.",
    audio.WithVoiceID("EXAVITQu4vr4xnSDxMaL"),
    audio.WithOptimizeStreamingLatency(3), // 0-4, higher = lower latency
)
if err != nil {
    log.Fatal(err)
}

file, _ := os.Create("output_stream.mp3")
defer file.Close()

for chunk := range chunkChan {
    if chunk.Error != nil {
        log.Fatal(chunk.Error)
    }
    if chunk.Done {
        break
    }
    file.Write(chunk.Data)
}
```

## List Available Voices

```go
voices, err := client.ListVoices(context.Background())
if err != nil {
    log.Fatal(err)
}

for _, voice := range voices {
    fmt.Printf("%s (%s) - %s\n", voice.Name, voice.VoiceID, voice.Category)
}
```

## Alignment Data

Enable character-level timing information for subtitles, word highlighting, or lip sync:

```go
response, err := client.GenerateAudio(
    context.Background(),
    "Hello, world!",
    audio.WithVoiceID("EXAVITQu4vr4xnSDxMaL"),
    audio.WithAlignmentEnabled(true),
)

// response.Alignment contains character-level timing
for i, char := range response.Alignment.Characters {
    fmt.Printf("%s: %.2fs - %.2fs\n", char,
        response.Alignment.CharacterStartTimesSeconds[i],
        response.Alignment.CharacterEndTimesSeconds[i],
    )
}
```

Alignment is also available per-chunk during streaming via `chunk.Alignment`.

## Forced Alignment

Match existing audio with a transcript to produce word-level timing data. The provider must implement the `ForcedAlignmentProvider` interface:

```go
if aligner, ok := client.(audio.ForcedAlignmentProvider); ok {
    audioData, _ := os.ReadFile("speech.mp3")
    result, err := aligner.GenerateForcedAlignment(ctx, audioData, "Hello, world!")

    for _, word := range result.Words {
        fmt.Printf("%s: %.2fs - %.2fs\n", word.Text, word.Start, word.End)
    }
}
```

## Generation Options

| Option | Description |
|--------|-------------|
| `WithVoiceID(id)` | Voice to use for generation |
| `WithOutputFormat(fmt)` | Audio format (`mp3_44100_128`, `pcm_16000`, etc.) |
| `WithStability(f)` | Voice consistency, 0.0–1.0 |
| `WithSimilarityBoost(f)` | Match to original voice, 0.0–1.0 |
| `WithStyle(f)` | Style exaggeration, 0.0–1.0 |
| `WithSpeakerBoost(bool)` | Enhanced speaker similarity |
| `WithOptimizeStreamingLatency(n)` | Latency optimization level, 0–4 |
| `WithAlignmentEnabled(bool)` | Enable character-level timing data |

## Client Options

```go
client, err := audio.NewAudioGeneration(
    model.ProviderElevenLabs,
    audio.WithAPIKey("your-key"),
    audio.WithModel(model.ElevenLabsAudioModels[model.ElevenTurboV2_5]),
    audio.WithTimeout(30*time.Second),
    audio.WithElevenLabsOptions(
        audio.WithElevenLabsBaseURL("custom-endpoint"),
    ),
)
```


---

## Providers > Speech-to-Text

> Source: providers/speech-to-text.md

# Speech-to-Text (Transcription)

## Basic Transcription

```go
import (
    "github.com/joakimcarlsson/ai/transcription"
    "github.com/joakimcarlsson/ai/model"
)

client, err := transcription.NewSpeechToText(
    model.ProviderOpenAI,
    transcription.WithAPIKey("your-api-key"),
    transcription.WithModel(model.OpenAITranscriptionModels[model.Whisper1]),
)
if err != nil {
    log.Fatal(err)
}

audioData, err := os.ReadFile("audio.mp3")
if err != nil {
    log.Fatal(err)
}

response, err := client.Transcribe(context.Background(), audioData)
if err != nil {
    log.Fatal(err)
}

fmt.Println(response.Text)
```

## Transcription with Options

```go
response, err := client.Transcribe(ctx, audioData,
    transcription.WithLanguage("en"),
    transcription.WithResponseFormat("verbose_json"),
    transcription.WithTimestampGranularities("word", "segment"),
    transcription.WithTemperature(0.2),
)

for _, segment := range response.Segments {
    fmt.Printf("[%.2fs - %.2fs] %s\n", segment.Start, segment.End, segment.Text)
}

for _, word := range response.Words {
    fmt.Printf("%s (%.2fs) ", word.Word, word.Start)
}
```

## Translation (to English)

```go
response, err := client.Translate(ctx, audioData,
    transcription.WithPrompt("Translate this Swedish audio to English"),
)

fmt.Println(response.Text)
```

## Client Options

```go
client, err := transcription.NewSpeechToText(
    model.ProviderOpenAI,
    transcription.WithAPIKey("your-key"),
    transcription.WithModel(model.OpenAITranscriptionModels[model.GPT4oTranscribe]),
    transcription.WithTimeout(30*time.Second),
)
```


---

## Providers > Rerankers

> Source: providers/rerankers.md

# Document Reranking

## Basic Usage

```go
import (
    "github.com/joakimcarlsson/ai/rerankers"
    "github.com/joakimcarlsson/ai/model"
)

reranker, err := rerankers.NewReranker(model.ProviderVoyage,
    rerankers.WithAPIKey(""),
    rerankers.WithModel(model.VoyageRerankerModels[model.Rerank25Lite]),
    rerankers.WithReturnDocuments(true),
)
if err != nil {
    log.Fatal(err)
}

query := "What is machine learning?"
documents := []string{
    "Machine learning is a subset of artificial intelligence.",
    "The weather today is sunny.",
    "Deep learning uses neural networks.",
}

response, err := reranker.Rerank(context.Background(), query, documents)
if err != nil {
    log.Fatal(err)
}

for i, result := range response.Results {
    fmt.Printf("Rank %d (Score: %.4f): %s\n",
        i+1, result.RelevanceScore, result.Document)
}
```

## Client Options

```go
reranker, err := rerankers.NewReranker(
    model.ProviderVoyage,
    rerankers.WithAPIKey(""),
    rerankers.WithModel(model.VoyageRerankerModels[model.Rerank25Lite]),
    rerankers.WithTopK(10),
    rerankers.WithReturnDocuments(true),
    rerankers.WithTruncation(true),
    rerankers.WithTimeout(30*time.Second),
)
```


---

## Providers > Fill-in-the-Middle

> Source: providers/fim.md

# Fill-in-the-Middle (FIM)

Code completion by providing a prompt (code before the cursor) and an optional suffix (code after the cursor), with the model filling in the middle. Useful for code editors and IDE integrations.

## Supported Providers

| Provider | Model |
|----------|-------|
| Mistral | Codestral |
| DeepSeek | DeepSeek Coder |

## Setup

```go
import (
    "github.com/joakimcarlsson/ai/fim"
    "github.com/joakimcarlsson/ai/model"
)

client, err := fim.NewFIM(model.ProviderMistral,
    fim.WithAPIKey(os.Getenv("MISTRAL_API_KEY")),
    fim.WithModel(model.MistralModels[model.Codestral]),
)
if err != nil {
    log.Fatal(err)
}
```

## Basic Completion

```go
maxTokens := int64(100)

resp, err := client.Complete(ctx, fim.Request{
    Prompt:    "func Add(a, b int) int {\n    ",
    Suffix:    "\n}",
    MaxTokens: &maxTokens,
})
if err != nil {
    log.Fatal(err)
}

fmt.Println(resp.Content)
// "return a + b"
```

## Streaming

```go
events := client.CompleteStream(ctx, fim.Request{
    Prompt:    "func Max(numbers []int) int {\n    ",
    Suffix:    "\n}",
    MaxTokens: &maxTokens,
})

for event := range events {
    switch event.Type {
    case fim.EventContentDelta:
        fmt.Print(event.Content)
    case fim.EventComplete:
        fmt.Printf("\nTokens: %d in, %d out\n",
            event.Response.Usage.InputTokens,
            event.Response.Usage.OutputTokens,
        )
    case fim.EventError:
        log.Fatal(event.Error)
    }
}
```

## Request

| Field | Type | Description |
|-------|------|-------------|
| `Prompt` | `string` | Code before the cursor (required) |
| `Suffix` | `string` | Code after the cursor (optional) |
| `MaxTokens` | `*int64` | Max tokens to generate |
| `Temperature` | `*float64` | Sampling temperature (0.0–1.0) |
| `TopP` | `*float64` | Nucleus sampling probability |
| `Stop` | `[]string` | Sequences that halt generation |
| `RandomSeed` | `*int64` | Seed for deterministic output |

## Client Options

| Option | Description |
|--------|-------------|
| `fim.WithAPIKey(key)` | API key for authentication |
| `fim.WithModel(m)` | Model to use |
| `fim.WithMaxTokens(n)` | Default max tokens |
| `fim.WithTemperature(t)` | Default temperature |
| `fim.WithTopP(p)` | Default top-p |
| `fim.WithTimeout(d)` | API request timeout |
| `fim.WithMistralOptions(...)` | Mistral-specific options |
| `fim.WithDeepSeekOptions(...)` | DeepSeek-specific options |


---

## Providers > Vision

> Source: providers/vision.md

# Vision (Multimodal Images)

Send images to LLMs for analysis using URL references or raw binary data. Works with any provider that supports multimodal input (Anthropic, OpenAI, Gemini).

## Image from URL

```go
import "github.com/joakimcarlsson/ai/message"

msg := message.NewUserMessage("What do you see in this image?")
msg.AddImageURL("https://example.com/photo.jpg", "")

response, err := client.SendMessages(ctx, []message.Message{msg}, nil)
fmt.Println(response.Content)
```

The second argument to `AddImageURL` is an optional detail level (`"low"`, `"high"`, or `""` for auto).

## Image from Binary Data

```go
imageData, _ := os.ReadFile("photo.jpg")

msg := message.NewUserMessage("Describe this image.")
msg.AddBinary("image/jpeg", imageData)

response, err := client.SendMessages(ctx, []message.Message{msg}, nil)
fmt.Println(response.Content)
```

## Multiple Images

```go
msg := message.NewUserMessage("Compare these two images.")
msg.AddImageURL("https://example.com/before.jpg", "")
msg.AddImageURL("https://example.com/after.jpg", "")

response, err := client.SendMessages(ctx, []message.Message{msg}, nil)
```

## MultiModalMessage

For full control, build messages with the `MultiModalMessage` type directly:

```go
msg := message.NewUserMultiModalMessage([]message.MultiModalContent{
    message.NewTextContent("What's in this image?"),
    message.NewImageURLContent("https://example.com/photo.jpg", "high"),
})

// Or with attachments
msg := message.NewUserMultiModalMessageWithAttachments(
    "Describe these files.",
    []message.Attachment{
        {MIMEType: "image/png", Data: pngData},
        {MIMEType: "image/jpeg", Data: jpegData},
    },
)
```

## Content Types

| Type | Constructor | Description |
|------|-------------|-------------|
| `text` | `NewTextContent(text)` | Text content |
| `image_url` | `NewImageURLContent(url, detail)` | Image from URL |
| `binary` | `NewBinaryContent(mimeType, data)` | Raw binary data (base64-encoded for the provider) |

## Supported Formats

Most providers accept JPEG, PNG, GIF, and WebP. Check your provider's documentation for size limits.


---

## Agent Framework > Overview

> Source: agent/overview.md

# Agent Framework

The agent package provides multi-agent orchestration with automatic tool execution, session management, persistent memory, sub-agents, handoffs, fan-out, and context strategies.

## Basic Agent

```go
import (
    "github.com/joakimcarlsson/ai/agent"
    "github.com/joakimcarlsson/ai/agent/session"
)

myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are a helpful assistant."),
    agent.WithTools(&weatherTool{}),
    agent.WithSession("user-123", session.FileStore("./sessions")),
)

response, _ := myAgent.Chat(ctx, "What's the weather in Tokyo?")
fmt.Println(response.Content)
```

## How It Works

When you call `Chat()`, the agent:

1. Builds the message history (system prompt + session messages + user message)
2. Sends messages to the LLM
3. If the LLM requests tool calls, executes them automatically
4. Loops back to step 2 with tool results until the LLM responds with text
5. Persists the conversation to the session store

## Configuration Options

| Option | Description | Default |
|--------|-------------|---------|
| `WithSystemPrompt(prompt)` | Sets the agent's behavior | none |
| `WithTools(tools...)` | Adds tools the agent can use | none |
| `WithSession(id, store)` | Enables conversation persistence | none |
| `WithMemory(id, store, opts...)` | Enables long-term memory | none |
| `WithMaxIterations(n)` | Max tool execution loops | 10 |
| `WithAutoExecute(bool)` | Auto-execute tool calls | true |
| `WithContextStrategy(strategy, maxTokens)` | Context window management | none |
| `WithSequentialToolExecution()` | Disable parallel tool execution | parallel |
| `WithMaxParallelTools(n)` | Limit concurrent tool execution | unlimited |
| `WithState(map)` | Template variables for system prompt | none |
| `WithInstructionProvider(fn)` | Dynamic system prompt generation | none |
| `WithHooks(hooks...)` | Add hook interceptors for observation/interception | none |
| `WithConfirmationProvider(fn)` | Require human approval for sensitive tools | none |
| `WithSubAgents(configs...)` | Register child agents | none |
| `WithHandoffs(configs...)` | Register peer agents for transfer | none |
| `WithFanOut(configs...)` | Register parallel task distribution | none |

## ChatResponse

```go
type ChatResponse struct {
    Content        string
    ToolCalls      []message.ToolCall
    ToolResults    []ToolExecutionResult
    Usage          llm.TokenUsage
    FinishReason   message.FinishReason
    AgentName      string         // Set when a handoff occurred
    TotalToolCalls int
    TotalDuration  time.Duration
    TotalTurns     int
}
```

All metrics are aggregated across the full agent loop, not just the final LLM call:

| Field | Description |
|-------|-------------|
| `TotalTurns` | Number of LLM round-trips (API calls) made |
| `TotalDuration` | Wall-clock time from `Chat()` entry to return |
| `TotalToolCalls` | Total tool invocations across all iterations |
| `ToolResults` | Results of every tool execution during the conversation |

## Debug APIs

Inspect the messages that would be sent to the LLM after applying context strategies:

```go
// Non-destructive — does not modify the session
messages, err := myAgent.PeekContextMessages(ctx, "Hello")

// Modifying — adds the user message to the session
messages, err := myAgent.BuildContextMessages(ctx, "Hello")
```

Use `PeekContextMessages` to debug context window management without side effects.


---

## Agent Framework > Session Management

> Source: agent/sessions.md

# Session Management

Sessions persist conversation history across multiple `Chat()` calls.

## Setup

```go
import "github.com/joakimcarlsson/ai/agent/session"

myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are a helpful assistant."),
    agent.WithSession("conversation-id", session.FileStore("./sessions")),
)
```

## Built-in Stores

```go
// Persistent JSON files
store := session.FileStore("./sessions")

// In-memory (ephemeral, lost on restart)
store := session.MemoryStore()
```

## Database Stores

Ready-to-use stores for production backends:

- [PostgreSQL](../integrations/postgres.md) — `postgres.SessionStore(ctx, connString)`
- [SQLite](../integrations/sqlite.md) — `sqlite.SessionStore(ctx, db)`

## Store Interface

Implement this interface to use any backend:

```go
type Store interface {
    Exists(ctx context.Context, id string) (bool, error)
    Create(ctx context.Context, id string) (Session, error)
    Load(ctx context.Context, id string) (Session, error)
    Delete(ctx context.Context, id string) error
}
```

## Session Interface

```go
type Session interface {
    ID() string
    GetMessages(ctx context.Context, limit *int) ([]message.Message, error)
    AddMessages(ctx context.Context, msgs []message.Message) error
    PopMessage(ctx context.Context) (*message.Message, error)
    Clear(ctx context.Context) error
}
```


---

## Agent Framework > Persistent Memory

> Source: agent/memory.md

# Persistent Memory

Memory enables cross-conversation fact storage and retrieval using vector-based semantic search.

## Setup

```go
import "github.com/joakimcarlsson/ai/agent/memory"

store := memory.NewStore(embedder)

myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are a personal assistant."),
    agent.WithMemory("user-123", store,
        memory.AutoExtract(),  // Auto-extract facts from conversations
        memory.AutoDedup(),    // LLM-based memory deduplication
    ),
)

response, _ := myAgent.Chat(ctx, "My name is Alice and I'm allergic to peanuts.")
// Agent automatically stores this fact and recalls it in future conversations
```

## Built-in Stores

```go
// In-memory vector store
store := memory.NewStore(embedder)

// File-persisted vector store
store := memory.FileStore("./memories", embedder)
```

## Memory Options

| Option | Description |
|--------|-------------|
| `memory.AutoExtract()` | Automatically extract facts from conversations after each response |
| `memory.AutoDedup()` | Use LLM to deduplicate similar memories before storing |
| `memory.LLM(l)` | Use a separate (cheaper) LLM for extraction and deduplication |

## Database Stores

Ready-to-use stores for production backends:

- [pgvector](../integrations/pgvector.md) — `pgvector.MemoryStore(ctx, connString, embedder)` — PostgreSQL with HNSW vector search

## Store Interface

Implement for any vector database backend:

```go
type Store interface {
    Store(ctx context.Context, id string, fact string, metadata map[string]any) error
    Search(ctx context.Context, id string, query string, limit int) ([]Entry, error)
    GetAll(ctx context.Context, id string, limit int) ([]Entry, error)
    Delete(ctx context.Context, memoryID string) error
    Update(ctx context.Context, memoryID string, fact string, metadata map[string]any) error
}
```

## How It Works

When `AutoExtract` is enabled:

1. After the agent responds, it reviews the conversation
2. An LLM extracts factual information worth remembering
3. If `AutoDedup` is enabled, the LLM checks for existing similar memories
4. New facts are stored, duplicates are merged or skipped

## Manual Memory Tools

When `AutoExtract` is disabled, the agent gets four memory tools that the LLM can call directly:

### store_memory

Store a fact about the user for future conversations.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `fact` | string | yes | The fact to remember |
| `category` | string | no | One of: `preference`, `personal`, `health`, `professional`, `other` |

### recall_memories

Search for relevant memories. Returns memory IDs for use with replace/delete.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `query` | string | yes | What to search for |

### replace_memory

Update an existing memory with corrected or updated information.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `memory_id` | string | yes | ID from `recall_memories` results |
| `fact` | string | yes | The updated fact |
| `category` | string | no | One of: `preference`, `personal`, `health`, `professional`, `other` |

### delete_memory

Remove a memory that is no longer accurate or relevant.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `memory_id` | string | yes | ID from `recall_memories` results |
| `reason` | string | no | Why the memory is being deleted |


---

## Agent Framework > Streaming

> Source: agent/streaming.md

# Streaming

`ChatStream` returns a channel of events for real-time response handling.

## Basic Usage

```go
for event := range myAgent.ChatStream(ctx, "Tell me a story") {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventThinkingDelta:
        // Extended thinking content (if supported)
    case types.EventToolUseStart:
        fmt.Printf("\nUsing tool: %s\n", event.ToolCall.Name)
    case types.EventToolUseStop:
        if event.ToolResult != nil {
            fmt.Printf("Tool result: %s\n", event.ToolResult.Output)
        }
    case types.EventHandoff:
        fmt.Printf("Handed off to: %s\n", event.AgentName)
    case types.EventComplete:
        fmt.Printf("\nDone! Tokens: %d\n", event.Response.Usage.InputTokens)
    case types.EventError:
        log.Fatal(event.Error)
    }
}
```

## ContinueStream

The streaming variant of `Continue()`:

```go
for event := range myAgent.ContinueStream(ctx, toolResults) {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Println("\nDone!")
    }
}
```

## Event Types

| Event | Field | Description |
|-------|-------|-------------|
| `EventContentStart` | — | Content generation is beginning |
| `EventContentDelta` | `Content` | Partial text token |
| `EventContentStop` | — | Content generation finished |
| `EventToolUseStart` | `ToolCall` | Tool invocation starting (name, ID) |
| `EventToolUseDelta` | `ToolCall` | Partial tool input JSON |
| `EventToolUseStop` | `ToolResult` | Tool execution completed with result |
| `EventThinkingDelta` | `Thinking` | Chain-of-thought reasoning (if model supports it) |
| `EventHandoff` | `AgentName` | Control transferred to another agent |
| `EventConfirmationRequired` | `ConfirmationRequest` | Tool awaiting human approval ([details](confirmation.md)) |
| `EventComplete` | `Response` | Streaming finished — contains the full `ChatResponse` |
| `EventError` | `Error` | An error occurred during streaming |
| `EventWarning` | `Error` | A non-fatal warning |

## ChatEvent

```go
type ChatEvent struct {
    Type       types.EventType
    Content    string              // EventContentDelta
    Thinking   string              // EventThinkingDelta
    ToolCall   *message.ToolCall   // EventToolUseStart/Delta
    ToolResult *ToolExecutionResult // EventToolUseStop
    Response   *ChatResponse       // EventComplete
    Error               error                    // EventError, EventWarning
    AgentName           string                   // EventHandoff
    ConfirmationRequest *tool.ConfirmationRequest // EventConfirmationRequired
}
```


---

## Agent Framework > Hooks

> Source: agent/hooks.md

# Hooks

Hooks let you observe, modify, or block agent behavior at key points in the execution pipeline. They cover tool calls, model interactions, error recovery, agent lifecycle, input validation, and cross-cutting event observation.

## Setup

```go
myAgent := agent.New(llmClient,
    agent.WithHooks(agent.Hooks{
        PreToolUse: func(ctx context.Context, tc agent.ToolUseContext) (agent.PreToolUseResult, error) {
            log.Printf("Tool call: %s (branch: %s)", tc.ToolName, tc.Branch)
            return agent.PreToolUseResult{Action: agent.HookAllow}, nil
        },
    }),
)
```

## Hook Types

| Hook | Fires | Can |
|------|-------|-----|
| `PreToolUse` | Before a tool executes | Allow, Deny, or Modify input |
| `PostToolUse` | After a tool executes | Allow or Modify output |
| `PreModelCall` | Before an LLM request | Allow or Modify messages/tools |
| `PostModelCall` | After an LLM response | Allow or Modify response |
| `OnSubagentStart` | When a background sub-agent launches | Observe only |
| `OnSubagentStop` | When a background sub-agent finishes | Observe only |
| `OnToolError` | When a tool returns an error | Allow (re-raise) or Modify (recover) |
| `OnModelError` | When an LLM call fails | Allow (re-raise) or Modify (recover) |
| `BeforeAgent` | Before an agent starts its run | Allow, Deny, or Modify (short-circuit) |
| `AfterAgent` | After an agent completes its run | Allow or Modify response |
| `BeforeRun` | At the start of Chat/ChatStream | Observe only |
| `AfterRun` | At the end of Chat/ChatStream | Observe only |
| `OnUserMessage` | When a user message arrives | Allow, Deny, or Modify message |
| `OnEvent` | On every hook event emitted | Observe only |

## HookAction

Every hook returns a `HookAction` that controls what happens next:

| Action | Behavior |
|--------|----------|
| `HookAllow` | Continue normally (default) |
| `HookDeny` | Block execution (PreToolUse, BeforeAgent, OnUserMessage) |
| `HookModify` | Replace input, output, messages, response, or recover from errors |

## Denying a Tool Call

Return `HookDeny` from `PreToolUse` to block a tool before it runs:

```go
agent.Hooks{
    PreToolUse: func(_ context.Context, tc agent.ToolUseContext) (agent.PreToolUseResult, error) {
        if tc.ToolName == "dangerous_tool" {
            return agent.PreToolUseResult{
                Action:     agent.HookDeny,
                DenyReason: "this tool is not allowed",
            }, nil
        }
        return agent.PreToolUseResult{Action: agent.HookAllow}, nil
    },
}
```

The agent receives a tool error result with the deny reason.

## Modifying Tool Input

Return `HookModify` from `PreToolUse` to rewrite the input before execution:

```go
agent.Hooks{
    PreToolUse: func(_ context.Context, tc agent.ToolUseContext) (agent.PreToolUseResult, error) {
        modified := strings.ReplaceAll(tc.Input, "SECRET", "[REDACTED]")
        return agent.PreToolUseResult{
            Action: agent.HookModify,
            Input:  modified,
        }, nil
    },
}
```

## Modifying Model Messages

Return `HookModify` from `PreModelCall` to inject or filter messages before they reach the LLM:

```go
agent.Hooks{
    PreModelCall: func(_ context.Context, mc agent.ModelCallContext) (agent.ModelCallResult, error) {
        extra := message.NewUserMessage("Remember: always respond in JSON.")
        return agent.ModelCallResult{
            Action:   agent.HookModify,
            Messages: append(mc.Messages, extra),
            Tools:    mc.Tools,
        }, nil
    },
}
```

## Error Recovery

### Tool Error Recovery

`OnToolError` fires when a tool returns an error, before the error reaches `PostToolUse`. Return `HookModify` with replacement output to recover:

```go
agent.Hooks{
    OnToolError: func(_ context.Context, tc agent.ToolErrorContext) (agent.ToolErrorResult, error) {
        if tc.ToolName == "flaky_api" {
            return agent.ToolErrorResult{
                Action: agent.HookModify,
                Output: "API temporarily unavailable, using cached data",
            }, nil
        }
        return agent.ToolErrorResult{Action: agent.HookAllow}, nil
    },
}
```

When recovery succeeds, the error flag is cleared and `PostToolUse` sees a non-error result. Multiple error callbacks chain — the first recovery wins.

### Model Error Recovery

`OnModelError` fires when an LLM call fails. Return `HookModify` with a replacement response to recover:

```go
agent.Hooks{
    OnModelError: func(_ context.Context, mc agent.ModelErrorContext) (agent.ModelErrorResult, error) {
        return agent.ModelErrorResult{
            Action: agent.HookModify,
            Response: &llm.Response{
                Content: "Service temporarily unavailable. Please try again.",
            },
        }, nil
    },
}
```

This works in both `Chat()` and `ChatStream()` paths.

## Agent Lifecycle

### Short-Circuiting with BeforeAgent

`BeforeAgent` fires before an agent starts its run. Return `HookModify` with a response to skip the agent entirely:

```go
agent.Hooks{
    BeforeAgent: func(_ context.Context, ac agent.LifecycleContext) (agent.LifecycleResult, error) {
        if cached, ok := cache.Get(ac.Input); ok {
            return agent.LifecycleResult{
                Action:   agent.HookModify,
                Response: &agent.ChatResponse{Content: cached},
            }, nil
        }
        return agent.LifecycleResult{Action: agent.HookAllow}, nil
    },
}
```

Return `HookDeny` to block the agent run with a nil response.

### Modifying with AfterAgent

`AfterAgent` fires after an agent completes. Modify the response before it reaches the caller:

```go
agent.Hooks{
    AfterAgent: func(_ context.Context, ac agent.LifecycleContext) (agent.LifecycleResult, error) {
        modified := *ac.Response
        modified.Content = sanitize(modified.Content)
        return agent.LifecycleResult{
            Action:   agent.HookModify,
            Response: &modified,
        }, nil
    },
}
```

## Run Lifecycle

`BeforeRun` and `AfterRun` are observation-only hooks that fire at the very start and end of `Chat()`/`ChatStream()`:

```go
agent.Hooks{
    BeforeRun: func(_ context.Context, rc agent.RunContext) {
        metrics.StartTimer(rc.AgentName)
    },
    AfterRun: func(_ context.Context, rc agent.RunContext) {
        metrics.RecordDuration(rc.AgentName, rc.Duration)
        if rc.Error != nil {
            metrics.RecordError(rc.AgentName, rc.Error)
        }
    },
}
```

`AfterRun` receives the final response, any error, and the total duration.

## Input Validation

`OnUserMessage` fires when a user message arrives, before it reaches any agent logic. Use it to preprocess, validate, or reject messages:

```go
agent.Hooks{
    OnUserMessage: func(_ context.Context, uc agent.UserMessageContext) (agent.UserMessageResult, error) {
        if containsPII(uc.Message) {
            return agent.UserMessageResult{
                Action:     agent.HookDeny,
                DenyReason: "message contains PII",
            }, nil
        }
        return agent.UserMessageResult{
            Action:  agent.HookModify,
            Message: sanitizeInput(uc.Message),
        }, nil
    },
}
```

`OnUserMessage` does not fire for `Continue()`/`ContinueStream()` since those resume with tool results, not user messages.

## Cross-Cutting Event Observation

`OnEvent` fires on every hook event emitted during execution. Use it for logging, analytics, or event transformation:

```go
agent.Hooks{
    OnEvent: func(_ context.Context, evt agent.HookEvent) {
        log.Printf("[%s] agent=%s tool=%s", evt.Type, evt.AgentName, evt.ToolName)
    },
}
```

`OnEvent` fires once per hook-point invocation (after all hooks in the chain have run), not once per registered hook. It covers all event types except itself.

## Chaining Multiple Hooks

Pass multiple `Hooks` to `WithHooks`, or call `WithHooks` multiple times. Hooks run in registration order.

```go
myAgent := agent.New(llmClient,
    agent.WithHooks(loggingHooks, guardRailHooks, metricsHooks),
)
```

Chain rules:

- **Deny wins immediately** — if any hook returns `HookDeny`, later hooks are skipped
- **Last Modify wins** — if multiple hooks return `HookModify`, the last one's value is used
- **First recovery wins** — for error callbacks (`OnToolError`, `OnModelError`), the first `HookModify` response is used
- **nil fields are skipped** — you only need to set the hooks you care about

## Observation with NewObservingHooks

For pure observation (logging, metrics, streaming to a UI), use the `NewObservingHooks` helper. It wires all hooks to emit structured `HookEvent` values to a single callback:

```go
myAgent := agent.New(llmClient,
    agent.WithHooks(agent.NewObservingHooks(func(evt agent.HookEvent) {
        log.Printf("[%s] agent=%s branch=%s tool=%s",
            evt.Type, evt.AgentName, evt.Branch, evt.ToolName)
    })),
)
```

All observing hooks return `HookAllow` — they never block or modify execution. `OnEvent` is left nil in observing hooks to avoid double-emission.

### HookEvent

| Field | Type | Description |
|-------|------|-------------|
| `Type` | `HookEventType` | Event type (see below) |
| `Timestamp` | `time.Time` | When the event fired |
| `AgentName` | `string` | Name of the agent |
| `TaskID` | `string` | Background task ID (if applicable) |
| `Branch` | `string` | Agent hierarchy path (e.g. `"orchestrator/researcher"`) |
| `ToolCallID` | `string` | Tool call ID (tool events only) |
| `ToolName` | `string` | Tool name (tool events only) |
| `Input` | `string` | Tool input, sub-agent task, or user message |
| `Output` | `string` | Tool output or sub-agent result |
| `IsError` | `bool` | Whether an error occurred |
| `Duration` | `time.Duration` | Execution duration (post-events only) |
| `Usage` | `llm.TokenUsage` | Token usage (post model call only) |
| `Error` | `string` | Error message (if `IsError` is true) |

### Event Types

| Constant | Value | When |
|----------|-------|------|
| `HookEventPreToolUse` | `"pre_tool_use"` | Before tool execution |
| `HookEventPostToolUse` | `"post_tool_use"` | After tool execution |
| `HookEventPreModelCall` | `"pre_model_call"` | Before LLM request |
| `HookEventPostModelCall` | `"post_model_call"` | After LLM response |
| `HookEventSubagentStart` | `"subagent_start"` | Background sub-agent launched |
| `HookEventSubagentStop` | `"subagent_stop"` | Background sub-agent finished |
| `HookEventToolError` | `"tool_error"` | Tool returned an error |
| `HookEventModelError` | `"model_error"` | LLM call failed |
| `HookEventBeforeAgent` | `"before_agent"` | Before agent starts |
| `HookEventAfterAgent` | `"after_agent"` | After agent completes |
| `HookEventBeforeRun` | `"before_run"` | Start of Chat/ChatStream |
| `HookEventAfterRun` | `"after_run"` | End of Chat/ChatStream |
| `HookEventUserMessage` | `"user_message"` | User message received |

## Branch

The `Branch` field on all hook contexts gives you the agent hierarchy as a `/`-separated path. For a nested setup where an orchestrator delegates to a researcher which delegates to a scraper:

```
Branch: "orchestrator/researcher/scraper"
```

This lets you immediately see which agent in the hierarchy produced an event, without cross-referencing task IDs.

## Hook Propagation

Hooks set on a parent agent automatically propagate to sub-agents that don't have their own hooks:

```go
orchestrator := agent.New(llmClient,
    agent.WithHooks(myHooks),
    agent.WithSubAgents(
        agent.SubAgentConfig{Name: "worker", Agent: worker},
    ),
)
// worker inherits myHooks since it has none of its own
```

If a sub-agent already has hooks configured, the parent's hooks are not applied.

## Context Structs

### ToolUseContext

Passed to `PreToolUse` and embedded in `PostToolUseContext` and `ToolErrorContext`:

```go
type ToolUseContext struct {
    ToolCallID string
    ToolName   string
    Input      string
    AgentName  string
    TaskID     string
    Branch     string
}
```

### PostToolUseContext

Passed to `PostToolUse`:

```go
type PostToolUseContext struct {
    ToolUseContext        // Embeds all fields from ToolUseContext
    Output   string
    IsError  bool
    Duration time.Duration
}
```

### ToolErrorContext

Passed to `OnToolError`:

```go
type ToolErrorContext struct {
    ToolUseContext        // Embeds all fields from ToolUseContext
    Error    error
    Output   string
    Duration time.Duration
}
```

### ModelCallContext

Passed to `PreModelCall`:

```go
type ModelCallContext struct {
    Messages  []message.Message
    Tools     []tool.BaseTool
    AgentName string
    TaskID    string
    Branch    string
}
```

### ModelResponseContext

Passed to `PostModelCall`:

```go
type ModelResponseContext struct {
    Response  *llm.Response
    Duration  time.Duration
    AgentName string
    TaskID    string
    Branch    string
    Error     error
}
```

### ModelErrorContext

Passed to `OnModelError`:

```go
type ModelErrorContext struct {
    Messages  []message.Message
    Tools     []tool.BaseTool
    Error     error
    AgentName string
    TaskID    string
    Branch    string
}
```

### SubagentEventContext

Passed to `OnSubagentStart` and `OnSubagentStop`:

```go
type SubagentEventContext struct {
    TaskID    string
    AgentName string
    Task      string
    Branch    string
    Result    string
    Error     error
    Duration  time.Duration
}
```

### LifecycleContext

Passed to `BeforeAgent` and `AfterAgent`:

```go
type LifecycleContext struct {
    AgentName string
    TaskID    string
    Branch    string
    Input     string
    Response  *ChatResponse   // nil for BeforeAgent, set for AfterAgent
}
```

### RunContext

Passed to `BeforeRun` and `AfterRun`:

```go
type RunContext struct {
    AgentName string
    TaskID    string
    Branch    string
    Input     string
    Response  *ChatResponse   // nil for BeforeRun, set for AfterRun
    Error     error           // nil for BeforeRun, set for AfterRun if failed
    Duration  time.Duration   // zero for BeforeRun
}
```

### UserMessageContext

Passed to `OnUserMessage`:

```go
type UserMessageContext struct {
    Message   string
    AgentName string
    TaskID    string
    Branch    string
}
```

## Streaming to a UI

A common use case is forwarding hook events to a frontend over WebSocket or SSE:

```go
agent.NewObservingHooks(func(evt agent.HookEvent) {
    data, _ := json.Marshal(evt)
    websocket.Send(data)
})
```

This gives the UI real-time visibility into tool calls, model interactions, error recovery, and agent lifecycle — including nested agent hierarchies via `Branch`.


---

## Agent Framework > Tool Confirmation

> Source: agent/confirmation.md

# Tool Confirmation

The confirmation protocol lets tools require human approval before executing. The framework provides the mechanism — consumers provide the UI/interaction layer.

## Setup

Register a `ConfirmationProvider` on the agent. The provider is called whenever a tool requires confirmation and blocks until the consumer provides a decision.

```go
myAgent := agent.New(llmClient,
    agent.WithTools(&DeleteTool{}),
    agent.WithConfirmationProvider(
        func(ctx context.Context, req tool.ConfirmationRequest) (bool, error) {
            // Present req to the user, wait for their decision
            return askUser(req.ToolName, req.Input, req.Hint), nil
        },
    ),
)
```

Return `true` to approve, `false` to reject. If the provider returns an error, the tool call fails with that error.

## Declarative Confirmation

Set `RequireConfirmation` on a tool's `Info` to require approval before `Run()` is called:

```go
func (t *DeleteTool) Info() tool.Info {
    info := tool.NewInfo("delete_records", "Delete database records", DeleteParams{})
    info.RequireConfirmation = true
    return info
}
```

When the agent encounters this tool, it calls the `ConfirmationProvider` before executing. If no provider is configured, the tool runs normally — confirmation is opt-in.

## Dynamic Confirmation

Tools can request confirmation from within `Run()` for conditional approval:

```go
func (t *TransferTool) Run(ctx context.Context, params tool.Call) (tool.Response, error) {
    var input TransferParams
    json.Unmarshal([]byte(params.Input), &input)

    if input.Amount > 10000 {
        err := tool.RequestConfirmation(ctx, "Large transfer exceeding $10,000", input)
        if err != nil {
            return tool.Response{}, err
        }
    }

    // Proceed with transfer
    return tool.NewTextResponse("Transfer complete"), nil
}
```

`RequestConfirmation` blocks until the consumer decides. If rejected, it returns `tool.ErrConfirmationRejected` — propagate this error to halt execution. If no `ConfirmationProvider` is configured, `RequestConfirmation` is a no-op (auto-approve).

## ConfirmationRequest

The provider receives a `ConfirmationRequest` with context about the tool call:

```go
type ConfirmationRequest struct {
    ToolCallID string // Unique ID of this tool call
    ToolName   string // Name of the tool
    Input      string // JSON-encoded arguments
    Hint       string // Human-readable description (dynamic confirmation only)
    Payload    any    // Arbitrary structured data (dynamic confirmation only)
}
```

For declarative confirmation (`RequireConfirmation` flag), `Hint` and `Payload` are empty. For dynamic confirmation (`RequestConfirmation`), they carry the values passed by the tool.

## Toolset-Level Confirmation

Use `tool.WithConfirmation` to mark all tools in a toolset as requiring confirmation:

```go
dangerousTools := tool.NewToolset("dangerous",
    &DeleteTool{},
    &DropTableTool{},
    &FormatDiskTool{},
)

confirmed := tool.WithConfirmation(dangerousTools)

myAgent := agent.New(llmClient,
    agent.WithToolsets(confirmed),
    agent.WithConfirmationProvider(myProvider),
)
```

This sets `RequireConfirmation = true` on every tool in the toolset without modifying the originals.

## Streaming

In the streaming path (`ChatStream`), an `EventConfirmationRequired` event is emitted before the provider blocks. This allows the consumer to present a UI and then unblock the provider:

```go
for event := range myAgent.ChatStream(ctx, "Delete old records") {
    switch event.Type {
    case types.EventConfirmationRequired:
        req := event.ConfirmationRequest
        fmt.Printf("Tool %q wants to run with input: %s\n", req.ToolName, req.Input)
        // The provider is blocking — respond via whatever mechanism it uses
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Println("\nDone!")
    }
}
```

A common pattern is to use a channel-based provider that the streaming consumer unblocks:

```go
type approval struct {
    approved bool
    ch       chan struct{}
}

pending := make(map[string]*approval)
var mu sync.Mutex

provider := func(ctx context.Context, req tool.ConfirmationRequest) (bool, error) {
    a := &approval{ch: make(chan struct{})}
    mu.Lock()
    pending[req.ToolCallID] = a
    mu.Unlock()
    <-a.ch // Block until consumer decides
    return a.approved, nil
}

// In the stream consumer, when EventConfirmationRequired arrives:
// mu.Lock()
// a := pending[req.ToolCallID]
// mu.Unlock()
// a.approved = userClickedApprove
// close(a.ch)
```

## Interaction with Hooks

`PreToolUse` hooks run before confirmation. If a hook denies the tool, the confirmation provider is never called:

```
PreToolUse hooks → Confirmation check → tool.Run()
```

This means hooks enforce policy (rate limits, blocklists), while confirmation handles human approval.

## Handoffs

Each agent has its own `ConfirmationProvider`. When a handoff occurs, the new agent's provider is used. If the target agent has no provider, its tools run without confirmation.

## Auto-Approve Patterns

The provider is a regular function — implement any approval logic:

```go
// Always approve (useful for testing)
agent.WithConfirmationProvider(
    func(_ context.Context, _ tool.ConfirmationRequest) (bool, error) {
        return true, nil
    },
)

// Check a database of pre-approved tools
agent.WithConfirmationProvider(
    func(ctx context.Context, req tool.ConfirmationRequest) (bool, error) {
        return db.IsToolPreApproved(ctx, userID, req.ToolName)
    },
)

// Approve safe tools, prompt for dangerous ones
agent.WithConfirmationProvider(
    func(ctx context.Context, req tool.ConfirmationRequest) (bool, error) {
        if req.ToolName == "read_file" {
            return true, nil
        }
        return promptUser(ctx, req)
    },
)
```


---

## Agent Framework > Sub-Agents

> Source: agent/sub-agents.md

# Sub-Agents

Sub-agents let an orchestrator delegate tasks to specialized child agents. Each sub-agent becomes a callable tool.

## Setup

```go
researcher := agent.New(llmClient,
    agent.WithSystemPrompt("You are a research specialist."),
    agent.WithTools(&webSearchTool{}),
)

writer := agent.New(llmClient,
    agent.WithSystemPrompt("You are a content writer."),
)

orchestrator := agent.New(llmClient,
    agent.WithSystemPrompt("You coordinate research and writing tasks."),
    agent.WithSubAgents(
        agent.SubAgentConfig{Name: "researcher", Description: "Researches topics", Agent: researcher},
        agent.SubAgentConfig{Name: "writer", Description: "Writes content", Agent: writer},
    ),
)

response, _ := orchestrator.Chat(ctx, "Research and write about quantum computing")
```

## How It Works

1. Each `SubAgentConfig` registers a tool named after the sub-agent
2. The orchestrator LLM decides when to delegate a task
3. The sub-agent runs to completion and returns its response
4. The orchestrator continues with the sub-agent's output

## SubAgentConfig

```go
type SubAgentConfig struct {
    Name        string  // Tool name the orchestrator calls
    Description string  // Describes when to use this sub-agent
    Agent       *Agent  // The sub-agent instance
}
```

## Background Execution

Sub-agents can run asynchronously by passing `background: true`. The orchestrator gets a `task_id` immediately and can check status or wait for results later.

```go
orchestrator := agent.New(llmClient,
    agent.WithSystemPrompt(`Launch background tasks, then collect results.`),
    agent.WithSubAgents(
        agent.SubAgentConfig{
            Name:        "researcher",
            Description: "Research a topic. Supports background: true for async execution.",
            Agent:       researcher,
        },
    ),
)
```

When the LLM calls the sub-agent with `background: true`:

1. The task launches in a goroutine and returns `{"task_id": "task-1", "status": "launched"}`
2. Three task management tools are automatically available: `get_task_result`, `stop_task`, `list_tasks`
3. The orchestrator uses `get_task_result` with `wait: true` to collect results

See [Background Agents](background-agents.md) for the full tool reference and examples.


---

## Agent Framework > Background Agents

> Source: agent/background-agents.md

# Background Agents

Background agents let the orchestrator launch sub-agents asynchronously. Tasks run in goroutines and the orchestrator can continue working, check status, or wait for results.

## Setup

```go
researcher := agent.New(llmClient,
    agent.WithSystemPrompt("You are a concise research assistant."),
)

orchestrator := agent.New(llmClient,
    agent.WithSystemPrompt(`You coordinate research tasks.
1. Launch background tasks with background: true
2. Collect results with get_task_result (wait: true)
3. Synthesize the results`),
    agent.WithSubAgents(
        agent.SubAgentConfig{
            Name:        "researcher",
            Description: "Research a topic. Supports background: true for async execution.",
            Agent:       researcher,
        },
    ),
)
```

## How It Works

1. The orchestrator calls a sub-agent tool with `background: true`
2. The sub-agent launches in a goroutine and returns a `task_id` immediately
3. Three task management tools are auto-registered for the orchestrator:

| Tool | Description |
|------|-------------|
| `get_task_result` | Check status or wait for a background task to complete |
| `stop_task` | Cancel a running background task |
| `list_tasks` | List all background tasks and their status |

## Task Lifecycle

Tasks move through these states:

| Status | Description |
|--------|-------------|
| `running` | Task is currently executing |
| `completed` | Task finished successfully |
| `failed` | Task encountered an error |
| `cancelled` | Task was explicitly cancelled |

## Tool Reference

### get_task_result

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `task_id` | string | yes | The task ID returned when the task was launched |
| `wait` | bool | no | If true, block until the task completes |
| `timeout` | int | no | Max wait time in milliseconds. 0 means no timeout |

### stop_task

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `task_id` | string | yes | The task ID to cancel |

### list_tasks

No parameters. Returns all tasks with their ID, agent name, and status.

## Streaming Example

```go
for event := range orchestrator.ChatStream(ctx, "Compare Go and Rust. Research each in the background.") {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventError:
        log.Fatal(event.Error)
    }
    if event.ToolResult != nil {
        fmt.Printf("\n[Tool: %s → %s]\n", event.ToolResult.ToolName, event.ToolResult.Output)
    }
}
```

## Sub-Agent Input

When a sub-agent is called, it accepts these parameters:

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `task` | string | yes | The task or question to send to the sub-agent |
| `background` | bool | no | If true, run in background and return a task ID |
| `max_turns` | int | no | Maximum tool-execution turns. 0 uses the agent default |


---

## Agent Framework > Handoffs

> Source: agent/handoffs.md

# Handoffs

Handoffs transfer full control from one agent to another. Unlike sub-agents (which return results to the orchestrator), handoffs permanently switch the active agent.

## Setup

```go
billing := agent.New(llmClient,
    agent.WithSystemPrompt("You handle billing inquiries."),
)

support := agent.New(llmClient,
    agent.WithSystemPrompt("You handle technical support."),
)

triage := agent.New(llmClient,
    agent.WithSystemPrompt("Route the user to the right specialist."),
    agent.WithHandoffs(
        agent.HandoffConfig{Name: "billing", Description: "Billing questions", Agent: billing},
        agent.HandoffConfig{Name: "support", Description: "Technical issues", Agent: support},
    ),
)

response, _ := triage.Chat(ctx, "I was charged twice on my last invoice")
fmt.Println(response.AgentName) // "billing"
```

## How It Works

1. Each `HandoffConfig` auto-generates a `transfer_to_<name>` tool
2. When the triage agent calls `transfer_to_billing`, control transfers permanently
3. The billing agent's system prompt replaces the triage agent's
4. The conversation history carries over
5. `ChatResponse.AgentName` indicates which agent produced the final response

## HandoffConfig

```go
type HandoffConfig struct {
    Name        string  // Used to generate transfer_to_<name> tool
    Description string  // Tells the LLM when to transfer
    Agent       *Agent  // The target agent
}
```

## Handoffs vs Sub-Agents

| | Sub-Agents | Handoffs |
|---|---|---|
| Control flow | Returns to orchestrator | Permanent transfer |
| System prompt | Sub-agent uses its own | Replaces current |
| Use case | Task delegation | Routing/triage |


---

## Agent Framework > Fan-Out

> Source: agent/fan-out.md

# Fan-Out

Fan-out distributes multiple tasks to worker agents in parallel and collects results.

## Setup

```go
researcher := agent.New(llmClient,
    agent.WithSystemPrompt("Research the given topic thoroughly."),
)

coordinator := agent.New(llmClient,
    agent.WithSystemPrompt("You coordinate parallel research tasks."),
    agent.WithFanOut(agent.FanOutConfig{
        Name:           "research",
        Description:    "Research multiple topics in parallel",
        Agent:          researcher,
        MaxConcurrency: 3,
    }),
)

response, _ := coordinator.Chat(ctx, "Compare AI, blockchain, and quantum computing")
```

## How It Works

1. The `FanOutConfig` registers a tool that accepts multiple tasks
2. When the coordinator calls the fan-out tool, all tasks run concurrently
3. `MaxConcurrency` limits how many worker agents run at the same time
4. Results are collected and returned to the coordinator

## FanOutConfig

```go
type FanOutConfig struct {
    Name           string  // Tool name
    Description    string  // Describes when to use fan-out
    Agent          *Agent  // Worker agent (cloned per task)
    MaxConcurrency int     // Max parallel workers (0 = unlimited)
}
```


---

## Agent Framework > Continue/Resume

> Source: agent/continue.md

# Continue/Resume

`Continue()` lets you manually execute tool calls and feed results back into the agent loop. This is useful when tools require human approval, external API calls, or custom execution logic.

## Setup

```go
myAgent := agent.New(llmClient,
    agent.WithAutoExecute(false), // Don't auto-execute tools
    agent.WithSession("conv-1", session.MemoryStore()),
)
```

## Usage

```go
// First call returns pending tool calls instead of executing them
response, _ := myAgent.Chat(ctx, "Search for flights to Tokyo")

// Inspect what tools the LLM wants to call
for _, tc := range response.ToolCalls {
    fmt.Printf("Tool: %s, Input: %s\n", tc.Name, tc.Input)
}

// Execute tools externally with your own logic
results := []message.ToolResult{
    {
        ToolCallID: response.ToolCalls[0].ID,
        Name:       "search_flights",
        Content:    `{"flights": [{"airline": "JAL", "price": 850}]}`,
    },
}

// Resume the agent loop with results
response, _ = myAgent.Continue(ctx, results)
fmt.Println(response.Content)
```

## Streaming Variant

```go
for event := range myAgent.ContinueStream(ctx, results) {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Println("\nDone!")
    }
}
```

> **Note:**
> `Continue()` requires a session to be configured, since it needs to restore conversation state from the previous `Chat()` call.
>

---

## Agent Framework > Context Strategies

> Source: agent/context-strategies.md

# Context Strategies

Context strategies automatically manage the context window when conversations grow beyond token limits.

## Available Strategies

### Sliding Window

Keep only the last N messages:

```go
import "github.com/joakimcarlsson/ai/tokens/sliding"

myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are a helpful assistant."),
    agent.WithSession("conv-1", store),
    agent.WithContextStrategy(sliding.Strategy(sliding.KeepLast(10)), 0),
)
```

### Truncate

Remove oldest messages to fit the token budget:

```go
import "github.com/joakimcarlsson/ai/tokens/truncate"

myAgent := agent.New(llmClient,
    agent.WithContextStrategy(truncate.Strategy(), 0),
)
```

### Summarize

Use an LLM to compress older messages into a summary:

```go
import "github.com/joakimcarlsson/ai/tokens/summarize"

myAgent := agent.New(llmClient,
    agent.WithContextStrategy(summarize.Strategy(llmClient), 0),
)
```

## How It Works

Before each LLM call, the agent:

1. Counts tokens for all messages + system prompt + tools
2. If total exceeds the limit, applies the strategy
3. The strategy reduces messages while preserving recent context
4. The session is updated if the strategy produces a session update (e.g., summary message)

## Custom Max Tokens

The second argument to `WithContextStrategy` sets a custom max token limit. Pass `0` to auto-calculate from the model's context window minus a 4096-token reserve.

```go
// Custom limit: 50k tokens
agent.WithContextStrategy(sliding.Strategy(sliding.KeepLast(20)), 50000)
```

## Custom Strategy

Implement the `tokens.Strategy` interface:

```go
type Strategy interface {
    Fit(ctx context.Context, input StrategyInput) (*StrategyResult, error)
}
```


---

## Agent Framework > Toolsets

> Source: agent/toolsets.md

# Toolsets

Toolsets group multiple tools under a name with optional dynamic filtering. Unlike static tool lists, toolsets are resolved **per-call** — the predicate runs on every `Chat()` turn, so you can enable or disable tools based on runtime context.

## Creating a Toolset

A basic toolset is a named collection of tools:

```go
recon := tool.NewToolset("recon",
    &NmapTool{},
    &DnsLookupTool{},
    &WhoisTool{},
)

a := agent.New(llmClient,
    agent.WithToolsets(recon),
)
```

You can mix toolsets with individual tools:

```go
a := agent.New(llmClient,
    agent.WithTools(&AlwaysAvailableTool{}),
    agent.WithToolsets(recon, exploitation),
)
```

## Filtered Toolsets

`NewFilterToolset` wraps a toolset with a predicate that controls which tools are available. The predicate receives the `context.Context` and each tool, and returns whether that tool should be included.

```go
type phaseKey struct{}

allTools := tool.NewToolset("pentest",
    &NmapTool{},
    &SqlInjectionTool{},
    &BruteForcePasswordTool{},
)

filtered := tool.NewFilterToolset("phase-aware", allTools,
    func(ctx context.Context, t tool.BaseTool) bool {
        phase, _ := ctx.Value(phaseKey{}).(string)
        switch t.Info().Name {
        case "sql_injection", "brute_force_password":
            return phase == "exploitation"
        default:
            return true
        }
    },
)

a := agent.New(llmClient,
    agent.WithToolsets(filtered),
)

// During recon phase, only NmapTool is available
ctx := context.WithValue(ctx, phaseKey{}, "recon")
resp, _ := a.Chat(ctx, "Start scanning the target")

// During exploitation phase, all tools are available
ctx = context.WithValue(ctx, phaseKey{}, "exploitation")
resp, _ = a.Chat(ctx, "Try exploiting the SQL injection")
```

### Filtering by Configuration

Predicates can also read from engagement configuration or any other source:

```go
type EngagementConfig struct {
    AllowBruteForce bool
    AllowExploits   bool
}

configKey := struct{}{}

filtered := tool.NewFilterToolset("engagement", allTools,
    func(ctx context.Context, t tool.BaseTool) bool {
        cfg, _ := ctx.Value(configKey).(*EngagementConfig)
        if cfg == nil {
            return false
        }
        switch t.Info().Name {
        case "brute_force":
            return cfg.AllowBruteForce
        case "sql_injection", "xss_scanner":
            return cfg.AllowExploits
        default:
            return true
        }
    },
)
```

## Composing Toolsets

Toolsets compose — use `NewCompositeToolset` to merge multiple toolsets into one:

```go
recon := tool.NewToolset("recon", &NmapTool{}, &DnsLookupTool{})
exploit := tool.NewToolset("exploit", &SqlInjectionTool{})
reporting := tool.NewToolset("reporting", &ReportTool{})

all := tool.NewCompositeToolset("full-suite", recon, exploit, reporting)
```

Composite toolsets work with filtered toolsets too — you can filter individual groups and then compose them:

```go
filteredExploit := tool.NewFilterToolset("filtered-exploit", exploit, exploitPredicate)
combined := tool.NewCompositeToolset("suite", recon, filteredExploit, reporting)
```

## MCP Toolsets

Wrap MCP server tools as a toolset:

```go
mcpTools := tool.MCPToolset("external", map[string]tool.MCPServer{
    "filesystem": {
        Command: "npx",
        Args:    []string{"-y", "@modelcontextprotocol/server-filesystem", "/tmp"},
        Type:    tool.MCPStdio,
    },
})

a := agent.New(llmClient,
    agent.WithToolsets(mcpTools),
)
```

## Confirmation Wrapper

`tool.WithConfirmation` wraps a toolset so every tool in it requires human approval before execution. Pair it with `WithConfirmationProvider` on the agent:

```go
dangerous := tool.NewToolset("exploits",
    &SqlInjectionTool{},
    &BruteForcePasswordTool{},
)

a := agent.New(llmClient,
    agent.WithToolsets(tool.WithConfirmation(dangerous)),
    agent.WithConfirmationProvider(myApprovalHandler),
)
```

The original toolset is not modified. See [Tool Confirmation](confirmation.md) for the full protocol.

## Toolsets and Hooks

Since toolsets resolve to `[]tool.BaseTool`, [hooks](hooks.md) apply to individual tools regardless of how they were grouped:

```go
a := agent.New(llmClient,
    agent.WithToolsets(exploitToolset),
    agent.WithHooks(agent.Hooks{
        PreToolUse: func(ctx context.Context, tc agent.ToolUseContext) (agent.PreToolUseResult, error) {
            if tc.ToolName == "sql_injection" {
                return agent.PreToolUseResult{
                    Action:     agent.HookDeny,
                    DenyReason: "SQL injection blocked by policy",
                }, nil
            }
            return agent.PreToolUseResult{Action: agent.HookAllow}, nil
        },
    }),
)
```

## Custom Toolset Implementations

The `Toolset` interface is simple — implement it for custom resolution logic:

```go
type Toolset interface {
    Name() string
    Tools(ctx context.Context) []tool.BaseTool
}
```

For example, a toolset that loads tools from a database:

```go
type DBToolset struct {
    db *sql.DB
}

func (d *DBToolset) Name() string { return "db-tools" }

func (d *DBToolset) Tools(ctx context.Context) []tool.BaseTool {
    // Query available tools from database based on user permissions
    rows, _ := d.db.QueryContext(ctx, "SELECT name, config FROM tools WHERE enabled = true")
    // ... build and return tools
}
```


---

## Agent Framework > Instruction Templates

> Source: agent/instruction-templates.md

# Instruction Templates

Dynamic system prompts using template variables or runtime-generated instructions.

## Static Templates

Use Go template syntax (`{{.var}}`) with `WithState`:

```go
myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are {{.role}}. Help {{.user_name}} with their tasks."),
    agent.WithState(map[string]any{
        "role":      "a coding assistant",
        "user_name": "Alice",
    }),
)
```

## Conditional Templates

```go
myAgent := agent.New(llmClient,
    agent.WithSystemPrompt(`You are a helpful assistant.
{{if .extra_context}}
Additional context: {{.extra_context}}
{{end}}`),
    agent.WithState(map[string]any{
        "extra_context": "The user prefers concise answers.",
    }),
)
```

## Dynamic Provider

For fully dynamic prompts generated at runtime:

```go
myAgent := agent.New(llmClient,
    agent.WithInstructionProvider(func(ctx context.Context, state map[string]any) (string, error) {
        return fmt.Sprintf(
            "Current time: %s\nYou are a helpful assistant.",
            time.Now().Format(time.RFC3339),
        ), nil
    }),
)
```

The instruction provider receives the state map and can use it alongside any other runtime data (database lookups, feature flags, etc.).


---

## Integrations > PostgreSQL

> Source: integrations/postgres.md

# PostgreSQL

PostgreSQL-backed session store for persistent conversation history. No extensions required.

## Installation

```bash
go get github.com/joakimcarlsson/ai/integrations/postgres
```

## Setup

```go
import "github.com/joakimcarlsson/ai/integrations/postgres"

sessionStore, err := postgres.SessionStore(ctx, "postgres://user:pass@localhost:5432/mydb?sslmode=disable")
if err != nil {
    log.Fatal(err)
}

myAgent := agent.New(llmClient,
    agent.WithSession("conv-1", sessionStore),
)
```

Tables and indexes are created automatically on first use.

## Schema

```sql
CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id TEXT PRIMARY KEY,
    session_id TEXT NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    role TEXT NOT NULL,
    parts JSONB NOT NULL,
    model TEXT,
    created_at BIGINT NOT NULL
);

CREATE INDEX messages_session_idx ON messages(session_id, created_at);
```

## Options

| Option | Description |
|--------|-------------|
| `postgres.WithIDGenerator(fn)` | Custom ID generator for message records. Default: UUID v4 |

```go
store, err := postgres.SessionStore(ctx, connString,
    postgres.WithIDGenerator(func() string {
        return myCustomID()
    }),
)
```

## Full Example

```go
package main

import (
    "context"
    "fmt"
    "log"
    "os"

    "github.com/joakimcarlsson/ai/agent"
    "github.com/joakimcarlsson/ai/agent/memory"
    "github.com/joakimcarlsson/ai/embeddings"
    "github.com/joakimcarlsson/ai/integrations/pgvector"
    "github.com/joakimcarlsson/ai/integrations/postgres"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)

func main() {
    ctx := context.Background()
    connString := "postgres://postgres:password@localhost:5432/example?sslmode=disable"

    embedder, err := embeddings.NewEmbedding(
        model.ProviderOpenAI,
        embeddings.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        embeddings.WithModel(model.OpenAIEmbeddingModels[model.TextEmbedding3Small]),
    )
    if err != nil {
        log.Fatal(err)
    }

    llmClient, err := llm.NewLLM(
        model.ProviderOpenAI,
        llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        llm.WithModel(model.OpenAIModels[model.GPT4o]),
    )
    if err != nil {
        log.Fatal(err)
    }

    sessionStore, err := postgres.SessionStore(ctx, connString)
    if err != nil {
        log.Fatal(err)
    }

    memoryStore, err := pgvector.MemoryStore(ctx, connString, embedder)
    if err != nil {
        log.Fatal(err)
    }

    myAgent := agent.New(llmClient,
        agent.WithSystemPrompt("You are a personal assistant with memory."),
        agent.WithSession("conv-1", sessionStore),
        agent.WithMemory("alice", memoryStore,
            memory.AutoExtract(),
            memory.AutoDedup(),
        ),
    )

    response, err := myAgent.Chat(ctx, "Hi! My name is Alice and I love Italian food.")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(response.Content)
}
```


---

## Integrations > SQLite

> Source: integrations/sqlite.md

# SQLite

SQLite-backed session store for lightweight persistent conversation history. Bring your own `*sql.DB` connection with any SQLite driver.

## Installation

```bash
go get github.com/joakimcarlsson/ai/integrations/sqlite
```

## Setup

```go
import (
    "database/sql"

    _ "modernc.org/sqlite" // or any SQLite driver
    "github.com/joakimcarlsson/ai/integrations/sqlite"
)

db, err := sql.Open("sqlite", "./chat.db")
if err != nil {
    log.Fatal(err)
}

sessionStore, err := sqlite.SessionStore(ctx, db)
if err != nil {
    log.Fatal(err)
}

myAgent := agent.New(llmClient,
    agent.WithSession("conv-1", sessionStore),
)
```

Tables and indexes are created automatically on first use.

## Schema

```sql
CREATE TABLE sessions (
    id         TEXT PRIMARY KEY,
    created_at INTEGER NOT NULL
);

CREATE TABLE messages (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    role       TEXT NOT NULL,
    parts      TEXT NOT NULL,
    model      TEXT,
    created_at INTEGER NOT NULL
);

CREATE INDEX idx_messages_session ON messages(session_id, id);
```

## Options

| Option | Description |
|--------|-------------|
| `sqlite.WithTablePrefix(prefix)` | Prefix for all table names. Useful for multi-tenant or multiple stores in one database |

```go
store, err := sqlite.SessionStore(ctx, db,
    sqlite.WithTablePrefix("chat_"),
)
// Creates "chat_sessions" and "chat_messages" instead of "sessions" and "messages"
```

## Full Example

```go
package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "os"

    _ "modernc.org/sqlite"

    "github.com/joakimcarlsson/ai/agent"
    "github.com/joakimcarlsson/ai/integrations/sqlite"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)

func main() {
    ctx := context.Background()

    db, err := sql.Open("sqlite", "./chat.db")
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()

    llmClient, err := llm.NewLLM(
        model.ProviderOpenAI,
        llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        llm.WithModel(model.OpenAIModels[model.GPT4o]),
    )
    if err != nil {
        log.Fatal(err)
    }

    sessionStore, err := sqlite.SessionStore(ctx, db)
    if err != nil {
        log.Fatal(err)
    }

    myAgent := agent.New(llmClient,
        agent.WithSystemPrompt("You are a helpful assistant."),
        agent.WithSession("conv-1", sessionStore),
    )

    response, err := myAgent.Chat(ctx, "Hello!")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(response.Content)
}
```


---

## Integrations > pgvector

> Source: integrations/pgvector.md

# pgvector

PostgreSQL-backed memory store using [pgvector](https://github.com/pgvector/pgvector) for semantic vector search. Stores facts as embeddings and retrieves them using cosine similarity with HNSW indexing.

## Prerequisites

pgvector extension must be available in your PostgreSQL instance. The extension is enabled automatically on first use.

## Installation

```bash
go get github.com/joakimcarlsson/ai/integrations/pgvector
```

## Setup

```go
import (
    "github.com/joakimcarlsson/ai/integrations/pgvector"
    "github.com/joakimcarlsson/ai/agent/memory"
)

memoryStore, err := pgvector.MemoryStore(ctx, "postgres://user:pass@localhost:5432/mydb?sslmode=disable", embedder)
if err != nil {
    log.Fatal(err)
}

myAgent := agent.New(llmClient,
    agent.WithMemory("user-123", memoryStore,
        memory.AutoExtract(),
        memory.AutoDedup(),
    ),
)
```

The table, pgvector extension, and HNSW index are created automatically on first use. The vector dimension is auto-detected from the embedder's model configuration.

## Schema

```sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE memories (
    id TEXT PRIMARY KEY,
    owner_id TEXT NOT NULL,
    content TEXT NOT NULL,
    vector vector(1536),  -- dimension from embedder
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX memories_owner_idx ON memories(owner_id);
CREATE INDEX memories_vector_idx ON memories USING hnsw (vector vector_cosine_ops);
```

## Options

| Option | Description |
|--------|-------------|
| `pgvector.WithIDGenerator(fn)` | Custom ID generator for memory records. Default: UUID v4 |

```go
store, err := pgvector.MemoryStore(ctx, connString, embedder,
    pgvector.WithIDGenerator(func() string {
        return myCustomID()
    }),
)
```

## Full Example

```go
package main

import (
    "context"
    "fmt"
    "log"
    "os"

    "github.com/joakimcarlsson/ai/agent"
    "github.com/joakimcarlsson/ai/agent/memory"
    "github.com/joakimcarlsson/ai/embeddings"
    "github.com/joakimcarlsson/ai/integrations/pgvector"
    "github.com/joakimcarlsson/ai/integrations/postgres"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
)

func main() {
    ctx := context.Background()
    connString := "postgres://postgres:password@localhost:5432/example?sslmode=disable"

    embedder, err := embeddings.NewEmbedding(
        model.ProviderOpenAI,
        embeddings.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        embeddings.WithModel(model.OpenAIEmbeddingModels[model.TextEmbedding3Small]),
    )
    if err != nil {
        log.Fatal(err)
    }

    llmClient, err := llm.NewLLM(
        model.ProviderOpenAI,
        llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        llm.WithModel(model.OpenAIModels[model.GPT4o]),
    )
    if err != nil {
        log.Fatal(err)
    }

    // PostgreSQL sessions + pgvector memory
    sessionStore, err := postgres.SessionStore(ctx, connString)
    if err != nil {
        log.Fatal(err)
    }

    memoryStore, err := pgvector.MemoryStore(ctx, connString, embedder)
    if err != nil {
        log.Fatal(err)
    }

    myAgent := agent.New(llmClient,
        agent.WithSystemPrompt("You are a personal assistant with memory."),
        agent.WithSession("conv-1", sessionStore),
        agent.WithMemory("alice", memoryStore,
            memory.AutoExtract(),
            memory.AutoDedup(),
        ),
    )

    // First conversation — agent learns facts
    response, err := myAgent.Chat(ctx, "Hi! My name is Alice and I love Italian food.")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(response.Content)

    // New conversation — agent recalls memories via vector search
    agent2 := agent.New(llmClient,
        agent.WithSystemPrompt("You are a personal assistant with memory."),
        agent.WithSession("conv-2", sessionStore),
        agent.WithMemory("alice", memoryStore,
            memory.AutoExtract(),
            memory.AutoDedup(),
        ),
    )

    response, err = agent2.Chat(ctx, "Can you recommend a restaurant for me?")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(response.Content)
}
```


---

## Advanced > Batch Processing

> Source: advanced/batch-processing.md

# Batch Processing

Process bulk LLM and embedding requests efficiently using provider-native batch APIs or bounded concurrent execution.

## Native Batch APIs

Native batch APIs submit all requests as a single job that processes asynchronously on the provider side. Providers may offer reduced pricing for batch workloads (see the provider support table below for details). Results are typically returned within 24 hours, often much faster.

### OpenAI

```go
import (
    "github.com/joakimcarlsson/ai/batch"
    "github.com/joakimcarlsson/ai/model"
)

proc, _ := batch.New(
    model.ProviderOpenAI,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.OpenAIModels[model.GPT4o]),
    batch.WithPollInterval(30 * time.Second),
)

requests := []batch.Request{
    {
        ID:   "q1",
        Type: batch.RequestTypeChat,
        Messages: []message.Message{
            message.NewUserMessage("What is the capital of France?"),
        },
    },
    {
        ID:   "q2",
        Type: batch.RequestTypeChat,
        Messages: []message.Message{
            message.NewUserMessage("What is the capital of Japan?"),
        },
    },
}

resp, err := proc.Process(ctx, requests)
for _, r := range resp.Results {
    if r.Err != nil {
        fmt.Printf("[%s] Error: %v\n", r.ID, r.Err)
        continue
    }
    fmt.Printf("[%s] %s\n", r.ID, r.ChatResponse.Content)
}
```

### Anthropic

```go
proc, _ := batch.New(
    model.ProviderAnthropic,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.AnthropicModels[model.Claude4Sonnet]),
    batch.WithMaxTokens(1024),
    batch.WithPollInterval(30 * time.Second),
)
```

### Gemini / Vertex AI

```go
proc, _ := batch.New(
    model.ProviderGemini,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.GeminiModels[model.Gemini25Flash]),
    batch.WithPollInterval(30 * time.Second),
)
```

## Concurrent Fallback

For providers without native batch APIs, pass an existing LLM client. Requests run concurrently with a configurable concurrency limit.

```go
client, _ := llm.NewLLM(model.ProviderGroq,
    llm.WithAPIKey("your-api-key"),
    llm.WithModel(model.GroqModels[model.Llama4Scout]),
)

proc, _ := batch.New(
    model.ProviderGroq,
    batch.WithLLM(client),
    batch.WithMaxConcurrency(10),
)

resp, _ := proc.Process(ctx, requests)
```

## Batch Embeddings

```go
embedder, _ := embeddings.NewEmbedding(model.ProviderVoyage,
    embeddings.WithAPIKey("your-api-key"),
    embeddings.WithModel(model.VoyageEmbeddingModels[model.Voyage35]),
)

proc, _ := batch.New(
    model.ProviderVoyage,
    batch.WithEmbedding(embedder),
    batch.WithMaxConcurrency(5),
)

requests := []batch.Request{
    {ID: "doc1", Type: batch.RequestTypeEmbedding, Texts: []string{"first document"}},
    {ID: "doc2", Type: batch.RequestTypeEmbedding, Texts: []string{"second document"}},
}

resp, _ := proc.Process(ctx, requests)
```

## Provider Support

| Provider | Native Batch | Discount (as of writing) | Supported Endpoints |
|----------|-------------|--------------------------|---------------------|
| OpenAI | ✅ | 50% | Chat, Embeddings |
| Anthropic | ✅ | 50% | Messages |
| Gemini | ✅ | 50% | Content, Embeddings |
| Vertex AI | ✅ | ~50% | Content, Embeddings |
| All others | Concurrent fallback | — | Chat, Embeddings |

## Progress Tracking

### Callback

```go
proc, _ := batch.New(
    model.ProviderOpenAI,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.OpenAIModels[model.GPT4o]),
    batch.WithProgressCallback(func(p batch.Progress) {
        fmt.Printf("%d/%d completed, %d failed [%s]\n",
            p.Completed, p.Total, p.Failed, p.Status)
    }),
)
```

### Async Channel

```go
ch, err := proc.ProcessAsync(ctx, requests)

for event := range ch {
    switch event.Type {
    case batch.EventItem:
        fmt.Printf("[%s] done\n", event.Result.ID)
    case batch.EventProgress:
        fmt.Printf("%d/%d\n", event.Progress.Completed, event.Progress.Total)
    case batch.EventComplete:
        fmt.Println("all done")
    case batch.EventError:
        fmt.Printf("batch error: %v\n", event.Err)
    }
}
```

## Error Handling

Individual request failures never fail the batch. Each result carries its own error.

```go
resp, err := proc.Process(ctx, requests)

for _, r := range resp.Results {
    if r.Err != nil {
        continue
    }
    // use r.ChatResponse or r.EmbedResponse
}

fmt.Printf("Completed: %d, Failed: %d\n", resp.Completed, resp.Failed)
```

## Options

| Option | Description | Default |
|--------|-------------|---------|
| `WithAPIKey(key)` | API key for native batch providers | — |
| `WithModel(model)` | LLM model for chat batch requests | — |
| `WithEmbeddingModel(model)` | Embedding model for embedding batch requests | — |
| `WithMaxTokens(n)` | Max tokens per request | 4096 |
| `WithLLM(client)` | Existing LLM client for concurrent fallback | — |
| `WithEmbedding(client)` | Existing embedding client for concurrent fallback | — |
| `WithMaxConcurrency(n)` | Max parallel requests in concurrent mode | 10 |
| `WithProgressCallback(fn)` | Progress update callback | — |
| `WithPollInterval(d)` | Polling interval for native batch APIs | 30s |
| `WithTimeout(d)` | Request timeout | — |
| `WithOpenAIOptions(...)` | OpenAI-specific options (base URL, headers) | — |
| `WithGeminiOptions(...)` | Gemini-specific options (backend) | — |


---

## Advanced > BYOM

> Source: advanced/byom.md

# BYOM (Bring Your Own Model)

Use Ollama, LocalAI, vLLM, LM Studio, or any OpenAI-compatible inference server.

## Setup

```go
// 1. Create model
llamaModel := model.NewCustomModel(
    model.WithModelID("llama3.2"),
    model.WithAPIModel("llama3.2:latest"),
)

// 2. Register provider
ollama := llm.RegisterCustomProvider("ollama", llm.CustomProviderConfig{
    BaseURL:      "http://localhost:11434/v1",
    DefaultModel: llamaModel,
})

// 3. Use it
client, _ := llm.NewLLM(ollama)
response, _ := client.SendMessages(ctx, messages, nil)
```

## Supported Servers

Any server that implements the OpenAI-compatible API:

- **Ollama** — `http://localhost:11434/v1`
- **LocalAI** — `http://localhost:8080/v1`
- **vLLM** — `http://localhost:8000/v1`
- **LM Studio** — `http://localhost:1234/v1`

See `example/byom/main.go` for a complete example.


---

## Advanced > MCP Integration

> Source: advanced/mcp.md

# MCP (Model Context Protocol) Integration

This library integrates with the official [Model Context Protocol Go SDK](https://github.com/modelcontextprotocol/go-sdk) to provide seamless access to MCP servers and their tools.

## Stdio Connection (subprocess)

```go
import "github.com/joakimcarlsson/ai/tool"

mcpServers := map[string]tool.MCPServer{
    "filesystem": {
        Type:    tool.MCPStdio,
        Command: "npx",
        Args:    []string{"-y", "@modelcontextprotocol/server-filesystem", "/path/to/directory"},
        Env:     []string{"NODE_ENV=production"},
    },
}

mcpTools, err := tool.GetMcpTools(ctx, mcpServers)
if err != nil {
    log.Fatal(err)
}

response, err := client.SendMessages(ctx, messages, mcpTools)

defer tool.CloseMCPPool()
```

## SSE Connection (HTTP)

```go
mcpServers := map[string]tool.MCPServer{
    "remote": {
        Type: tool.MCPSse,
        URL:  "https://your-mcp-server.com/mcp",
        Headers: map[string]string{
            "Authorization": "Bearer your-token",
        },
    },
}

mcpTools, err := tool.GetMcpTools(ctx, mcpServers)
if err != nil {
    log.Fatal(err)
}

defer tool.CloseMCPPool()
```

## Complete Example

```go
package main

import (
    "context"
    "fmt"
    "log"
    "os"

    "github.com/joakimcarlsson/ai/message"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
    "github.com/joakimcarlsson/ai/tool"
)

func main() {
    ctx := context.Background()

    mcpServers := map[string]tool.MCPServer{
        "context7": {
            Type:    tool.MCPStdio,
            Command: "npx",
            Args: []string{
                "-y",
                "@upstash/context7-mcp",
                "--api-key",
                os.Getenv("CONTEXT7_API_KEY"),
            },
        },
    }

    mcpTools, err := tool.GetMcpTools(ctx, mcpServers)
    if err != nil {
        log.Fatal(err)
    }
    defer tool.CloseMCPPool()

    client, err := llm.NewLLM(
        model.ProviderOpenAI,
        llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        llm.WithModel(model.OpenAIModels[model.GPT4oMini]),
    )
    if err != nil {
        log.Fatal(err)
    }

    messages := []message.Message{
        message.NewUserMessage("Explain React hooks using Context7 to fetch the latest documentation"),
    }

    response, err := client.SendMessages(ctx, messages, mcpTools)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(response.Content)
}
```

## StreamableHTTP Connection

The newer MCP transport for HTTP-based servers:

```go
mcpServers := map[string]tool.MCPServer{
    "remote": {
        Type: tool.MCPStreamableHTTP,
        URL:  "https://your-mcp-server.com/mcp",
        Headers: map[string]string{
            "Authorization": "Bearer your-token",
        },
    },
}

mcpTools, err := tool.GetMcpTools(ctx, mcpServers)
defer tool.CloseMCPPool()
```

## Transport Types

| Type | Constant | Use Case |
|------|----------|----------|
| Stdio | `tool.MCPStdio` | Local subprocess (e.g., `npx` commands) |
| SSE | `tool.MCPSse` | HTTP server with Server-Sent Events |
| StreamableHTTP | `tool.MCPStreamableHTTP` | HTTP server with streamable responses |

## MCPServer Config

```go
type MCPServer struct {
    Command string            // Stdio: command to run
    Args    []string          // Stdio: command arguments
    Env     []string          // Stdio: environment variables
    Type    MCPType           // Transport type
    URL     string            // SSE/StreamableHTTP: server URL
    Headers map[string]string // SSE/StreamableHTTP: custom HTTP headers
}
```

## Features

- Supports stdio, SSE, and StreamableHTTP transports
- Connection pooling for efficient reuse of MCP server connections
- Custom HTTP headers for authentication on remote servers
- Automatic tool discovery and registration
- Compatible with all official MCP servers
- Tools are namespaced with server name (e.g., `context7_search`)
- Graceful cleanup with `CloseMCPPool()`


---

## Advanced > Tool Calling

> Source: advanced/tools.md

# Tool Calling

## Defining a Tool

```go
import "github.com/joakimcarlsson/ai/tool"

type WeatherParams struct {
    Location string `json:"location" desc:"City name"`
    Units    string `json:"units" desc:"Temperature units" enum:"celsius,fahrenheit" required:"false"`
}

type WeatherTool struct{}

func (w *WeatherTool) Info() tool.Info {
    return tool.NewInfo("get_weather", "Get current weather for a location", WeatherParams{})
}

func (w *WeatherTool) Run(ctx context.Context, params tool.Call) (tool.Response, error) {
    var input WeatherParams
    json.Unmarshal([]byte(params.Input), &input)
    return tool.NewTextResponse("Sunny, 22°C"), nil
}
```

## Function Tools

For simple tools that are just a function, use `functiontool.New` to skip the struct boilerplate:

```go
import "github.com/joakimcarlsson/ai/tool/functiontool"

type WeatherParams struct {
    Location string `json:"location" desc:"City name"`
    Units    string `json:"units" desc:"Temperature units" enum:"celsius,fahrenheit" required:"false"`
}

weatherTool := functiontool.New("get_weather", "Get current weather for a location",
    func(ctx context.Context, p WeatherParams) (string, error) {
        return fmt.Sprintf("Sunny, 22°C in %s", p.Location), nil
    },
)
```

The JSON schema is inferred from the parameter struct using the same struct tags as `tool.NewInfo`. The result is a standard `BaseTool` that works with the registry, toolsets, hooks, and agent system.

### Supported Signatures

The function's first parameter can optionally be `context.Context`, and the second can be a struct for input parameters. Both are optional:

```go
// With context and params
functiontool.New("name", "desc", func(ctx context.Context, p Params) (string, error) { ... })

// Params only (no context)
functiontool.New("name", "desc", func(p Params) (string, error) { ... })

// Context only (no input schema)
functiontool.New("name", "desc", func(ctx context.Context) (string, error) { ... })

// No inputs at all
functiontool.New("name", "desc", func() (string, error) { ... })
```

### Return Types

The first return value determines the response type:

```go
// String → tool.NewTextResponse
func(p Params) (string, error)

// tool.Response → passed through directly
func(p Params) (tool.Response, error)

// Any other type → tool.NewJSONResponse (auto-marshaled)
func(p Params) (MyStruct, error)
```

### Options

```go
// Require human confirmation before execution
functiontool.New("delete", "Delete records", deleteFn, functiontool.WithConfirmation())
```

## Using Tools with LLM

```go
weatherTool := &WeatherTool{}
tools := []tool.BaseTool{weatherTool}

response, err := client.SendMessages(ctx, messages, tools)
```

## Struct Tag Schema Generation

Generate JSON schemas automatically from Go structs:

```go
type SearchParams struct {
    Query   string   `json:"query" desc:"Search query"`
    Limit   int      `json:"limit" desc:"Max results" required:"false"`
    Filters []string `json:"filters" desc:"Filter tags" required:"false"`
}

info := tool.NewInfo("search", "Search documents", SearchParams{})
```

Supported tags:

| Tag | Description |
|-----|-------------|
| `json` | Parameter name |
| `desc` | Parameter description |
| `required` | `"true"` or `"false"` (non-pointer fields default to required) |
| `enum` | Comma-separated allowed values |

## Rich Tool Responses

```go
// Text response
tool.NewTextResponse("Result text")

// JSON response (auto-marshals any value)
tool.NewJSONResponse(map[string]any{"status": "ok", "count": 42})

// File/binary response
tool.NewFileResponse(pdfBytes, "application/pdf")

// Image response (base64)
tool.NewImageResponse(base64ImageData)

// Error response
tool.NewTextErrorResponse("Something went wrong")
```

## Parsing Tool Input

The agent package provides a generic helper:

```go
input, err := agent.ParseToolInput[WeatherParams](params.Input)
```

## Requiring Confirmation

Set `RequireConfirmation` on a tool's `Info` to require human approval before execution:

```go
func (t *DeleteTool) Info() tool.Info {
    info := tool.NewInfo("delete_records", "Delete database records", DeleteParams{})
    info.RequireConfirmation = true
    return info
}
```

Tools can also request confirmation dynamically from within `Run()`:

```go
func (t *TransferTool) Run(ctx context.Context, params tool.Call) (tool.Response, error) {
    if amount > 10000 {
        if err := tool.RequestConfirmation(ctx, "Large transfer", params); err != nil {
            return tool.Response{}, err
        }
    }
    // ...
}
```

Both require a `ConfirmationProvider` on the agent. See [Tool Confirmation](../agent/confirmation.md) for the full protocol.

## Toolsets

For grouping, filtering, and dynamically controlling which tools are available at runtime, see [Toolsets](../agent/toolsets.md).


---

## Advanced > Structured Output

> Source: advanced/structured-output.md

# Structured Output

Constrained generation that forces the LLM to return valid JSON matching a schema.

## Usage

```go
type CodeAnalysis struct {
    Language   string   `json:"language"`
    Functions  []string `json:"functions"`
    Complexity string   `json:"complexity"`
}

schema := &schema.StructuredOutputInfo{
    Name:        "code_analysis",
    Description: "Analyze code structure",
    Parameters: map[string]any{
        "language": map[string]any{
            "type":        "string",
            "description": "Programming language",
        },
        "functions": map[string]any{
            "type": "array",
            "items": map[string]any{"type": "string"},
            "description": "List of function names",
        },
        "complexity": map[string]any{
            "type": "string",
            "enum": []string{"low", "medium", "high"},
        },
    },
    Required: []string{"language", "functions", "complexity"},
}

response, err := client.SendMessagesWithStructuredOutput(ctx, messages, nil, schema)
if err != nil {
    log.Fatal(err)
}

var analysis CodeAnalysis
json.Unmarshal([]byte(*response.StructuredOutput), &analysis)
```

> **Note:**
> Structured output is supported by OpenAI, Gemini, Azure OpenAI, Vertex AI, Groq, OpenRouter, and xAI. Anthropic and AWS Bedrock do not currently support it.
>

---

## Advanced > Cost Tracking

> Source: advanced/cost-tracking.md

# Cost Tracking

All models include built-in pricing information for cost calculation.

## LLM Models

```go
model := model.OpenAIModels[model.GPT4o]
fmt.Printf("Input cost: $%.2f per 1M tokens\n", model.CostPer1MIn)
fmt.Printf("Output cost: $%.2f per 1M tokens\n", model.CostPer1MOut)

response, err := client.SendMessages(ctx, messages, nil)
inputCost := float64(response.Usage.InputTokens) * model.CostPer1MIn / 1_000_000
outputCost := float64(response.Usage.OutputTokens) * model.CostPer1MOut / 1_000_000
```

## Image Generation Models

```go
model := model.OpenAIImageGenerationModels[model.DALLE3]

// Pricing structure: size -> quality -> cost
standardCost := model.Pricing["1024x1024"]["standard"]  // $0.04
hdCost := model.Pricing["1024x1024"]["hd"]              // $0.08

// GPT Image 1 with multiple quality tiers
gptImageModel := model.OpenAIImageGenerationModels[model.GPTImage1]
lowCost := gptImageModel.Pricing["1024x1024"]["low"]       // $0.011
mediumCost := gptImageModel.Pricing["1024x1024"]["medium"] // $0.042
highCost := gptImageModel.Pricing["1024x1024"]["high"]     // $0.167
```

## Audio Generation Models

```go
model := model.ElevenLabsAudioModels[model.ElevenTurboV2_5]

fmt.Printf("Cost per 1M chars: $%.2f\n", model.CostPer1MChars)
fmt.Printf("Max characters per request: %d\n", model.MaxCharacters)
fmt.Printf("Supports streaming: %v\n", model.SupportsStreaming)

response, err := client.GenerateAudio(ctx, text, audio.WithVoiceID("voice-id"))
cost := float64(response.Usage.Characters) * model.CostPer1MChars / 1_000_000
fmt.Printf("Cost: $%.4f\n", cost)
```


---

## Advanced > Prompt Templates

> Source: advanced/prompt-templates.md

# Prompt Templates

A template engine for building dynamic prompts with variable substitution, built-in functions, caching, and validation.

## Basic Usage

```go
import "github.com/joakimcarlsson/ai/prompt"

result, err := prompt.Process("Hello, {{.name}}!", map[string]any{
    "name": "World",
})
// "Hello, World!"
```

## Reusable Templates

```go
tmpl, err := prompt.New("You are {{.role}}. Help with {{.task}}.")
if err != nil {
    log.Fatal(err)
}

result, err := tmpl.Process(map[string]any{
    "role": "a coding assistant",
    "task": "debugging",
})
// "You are a coding assistant. Help with debugging."
```

## Caching

Thread-safe template caching avoids re-parsing the same template repeatedly.

```go
cache := prompt.NewCache()

tmpl, err := prompt.New("You are {{.role}}.",
    prompt.WithCache(cache),
    prompt.WithName("system"),  // cache key
)
```

When using a cache without `WithName`, the template source is hashed automatically as the cache key.

## Validation

Require specific variables to be present in the data map:

```go
_, err := prompt.Process("Hello, {{.name}}!", map[string]any{},
    prompt.WithRequired("name"),
)
// error: missing required variables: name
```

## Strict Mode

Error on any missing variable instead of using zero values:

```go
tmpl, err := prompt.New("{{.name}} is {{.age}} years old.",
    prompt.WithStrictMode(),
)

_, err = tmpl.Process(map[string]any{"name": "Alice"})
// error: template execution fails because .age is missing
```

## Built-in Functions

### String

| Function | Description | Example |
|----------|-------------|---------|
| `upper` | Uppercase | `{{upper .name}}` |
| `lower` | Lowercase | `{{lower .name}}` |
| `title` | Title case | `{{title .name}}` |
| `trim` | Trim whitespace | `{{trim .text}}` |
| `trimPrefix` | Remove prefix | `{{trimPrefix "Mr. " .name}}` |
| `trimSuffix` | Remove suffix | `{{trimSuffix "." .text}}` |
| `replace` | Replace all | `{{replace "old" "new" .text}}` |
| `contains` | Check substring | `{{if contains .text "error"}}...{{end}}` |
| `hasPrefix` | Check prefix | `{{if hasPrefix .name "Dr."}}...{{end}}` |
| `hasSuffix` | Check suffix | `{{if hasSuffix .file ".go"}}...{{end}}` |

### Collections

| Function | Description | Example |
|----------|-------------|---------|
| `join` | Join slice | `{{join ", " .items}}` |
| `split` | Split string | `{{split "," .csv}}` |
| `first` | First element | `{{first .items}}` |
| `last` | Last element | `{{last .items}}` |
| `list` | Create slice | `{{list "a" "b" "c"}}` |

### Comparison

| Function | Description | Example |
|----------|-------------|---------|
| `eq` | Equal | `{{if eq .role "admin"}}...{{end}}` |
| `ne` / `neq` | Not equal | `{{if ne .status "done"}}...{{end}}` |
| `lt` | Less than | `{{if lt .count 10}}...{{end}}` |
| `le` | Less or equal | `{{if le .count 10}}...{{end}}` |
| `gt` | Greater than | `{{if gt .count 0}}...{{end}}` |
| `ge` | Greater or equal | `{{if ge .count 1}}...{{end}}` |

### Defaults

| Function | Description | Example |
|----------|-------------|---------|
| `default` | Default value | `{{default "anonymous" .name}}` |
| `coalesce` | First non-empty | `{{coalesce .nickname .name "unknown"}}` |
| `empty` | Check if empty | `{{if empty .list}}...{{end}}` |
| `ternary` | Conditional | `{{ternary .admin "admin" "user"}}` |

### Formatting

| Function | Description | Example |
|----------|-------------|---------|
| `indent` | Indent text | `{{indent 4 .code}}` |
| `nindent` | Newline + indent | `{{nindent 4 .code}}` |
| `quote` | Double quote | `{{quote .name}}` |
| `squote` | Single quote | `{{squote .name}}` |

## Custom Functions

Add your own template functions:

```go
import "text/template"

result, err := prompt.Process("{{shout .name}}", data,
    prompt.WithFuncs(template.FuncMap{
        "shout": func(s string) string {
            return strings.ToUpper(s) + "!!!"
        },
    }),
)
```

## Options

| Option | Description |
|--------|-------------|
| `prompt.WithCache(c)` | Enable template caching |
| `prompt.WithName(name)` | Set template name (used as cache key) |
| `prompt.WithRequired(vars...)` | Require specific variables |
| `prompt.WithStrictMode()` | Error on missing variables |
| `prompt.WithFuncs(funcs)` | Add custom template functions |

## With Agent Instruction Templates

The prompt package powers the agent's [instruction templates](../agent/instruction-templates.md) feature:

```go
myAgent := agent.New(llmClient,
    agent.WithSystemPrompt("You are {{.role}}. The user's name is {{.userName}}."),
    agent.WithState(map[string]any{
        "role":     "a helpful assistant",
        "userName": "Alice",
    }),
)
```


---

## Advanced > OpenTelemetry Tracing

> Source: advanced/tracing.md

# OpenTelemetry Tracing

Built-in OpenTelemetry instrumentation for all provider calls and agent execution. Includes traces, metrics, and structured log records following [GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). When no providers are configured, everything is a zero-cost no-op.

## Setup

Use the built-in setup helper to initialize traces, metrics, and logs in one call:

```go
import (
    "github.com/joakimcarlsson/ai/tracing"
    "go.opentelemetry.io/otel/sdk/resource"
    semconv "go.opentelemetry.io/otel/semconv/v1.36.0"
)

res, _ := resource.New(ctx, resource.WithAttributes(
    semconv.ServiceNameKey.String("my-ai-service"),
    semconv.ServiceVersionKey.String("1.0.0"),
))

providers, _ := tracing.New(ctx,
    tracing.WithResource(res),
    tracing.WithOTLPEndpoint("localhost:4318"),
)
defer providers.Shutdown(ctx)
```

This creates and globally registers a `TracerProvider`, `MeterProvider`, and `LoggerProvider` — all configured with OTLP HTTP exporters pointing at the given endpoint.

### Setup Options

| Option | Description |
|--------|-------------|
| `WithResource(r)` | Set service resource (name, version, environment) |
| `WithOTLPEndpoint(url)` | Configure OTLP HTTP exporters for all signals |
| `WithSpanProcessors(p...)` | Register custom span processors |
| `WithMetricReaders(r...)` | Register custom metric readers |
| `WithLogProcessors(p...)` | Register custom log processors |

If no `WithOTLPEndpoint` is provided, the helper checks the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable.

### Manual Setup

You can also configure providers manually using the standard OpenTelemetry SDK:

```go
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

exporter, _ := stdouttrace.New(stdouttrace.WithPrettyPrint())
tp := sdktrace.NewTracerProvider(sdktrace.WithBatcher(exporter))
defer tp.Shutdown(ctx)

otel.SetTracerProvider(tp)
```

All subsequent LLM calls, tool executions, and agent runs will produce spans and metrics.

## Span Hierarchy

When using the agent framework, spans form a parent-child tree:

```
invoke_agent {agent_name}
├── generate_content {model}       (LLM turn 1)
├── execute_tool {tool_name}       (single tool call)
├── generate_content {model}       (LLM turn 2)
└── ...
```

When the LLM requests multiple tool calls at once, they are grouped under a parent span:

```
invoke_agent {agent_name}
├── generate_content {model}
├── execute_tools                  (merged parent for 2+ tools)
│   ├── execute_tool {tool_a}
│   └── execute_tool {tool_b}
└── generate_content {model}
```

When using providers standalone (no agent), each call produces a root span:

```
generate_content {model}
generate_embeddings {model}
rerank {model}
generate_audio {model}
generate_image {model}
transcribe {model}
fim_complete {model}
```

## Instrumented Operations

Every provider package is instrumented at the public API level — one span per call, covering all underlying providers.

| Package | Span Name | Methods |
|---------|-----------|---------|
| `providers` (LLM) | `generate_content` | `SendMessages`, `StreamResponse`, and structured output variants |
| `embeddings` | `generate_embeddings` | `GenerateEmbeddings`, `GenerateMultimodalEmbeddings`, `GenerateContextualizedEmbeddings` |
| `rerankers` | `rerank` | `Rerank` |
| `audio` | `generate_audio` | `GenerateAudio`, `StreamAudio` |
| `image_generation` | `generate_image` | `GenerateImage`, `GenerateImageStreaming` |
| `transcription` | `transcribe` / `translate` | `Transcribe`, `Translate` |
| `fim` | `fim_complete` | `Complete`, `CompleteStream` |
| `agent` | `invoke_agent` | `Chat`, `ChatStream`, `Continue`, `ContinueStream` |
| `agent` (tools) | `execute_tool` | Each tool call during agent execution |

## Span Attributes

Spans carry GenAI semantic convention attributes.

### LLM (`generate_content`)

| Attribute | When |
|-----------|------|
| `gen_ai.system` | Always |
| `gen_ai.request.model` | Always |
| `gen_ai.request.max_tokens` | Always |
| `gen_ai.request.temperature` | If set |
| `gen_ai.request.top_p` | If set |
| `gen_ai.usage.input_tokens` | On completion |
| `gen_ai.usage.output_tokens` | On completion |
| `gen_ai.usage.cache_creation_tokens` | If non-zero |
| `gen_ai.usage.cache_read_tokens` | If non-zero |
| `gen_ai.response.finish_reason` | On completion |
| `gen_ai.response.tool_call_count` | If tool calls present |

### Agent (`invoke_agent`)

| Attribute | When |
|-----------|------|
| `gen_ai.agent.name` | Always |
| `gen_ai.usage.input_tokens` | On completion (aggregated) |
| `gen_ai.usage.output_tokens` | On completion (aggregated) |
| `gen_ai.agent.total_turns` | On completion |
| `gen_ai.agent.total_tool_calls` | On completion |

### Tool (`execute_tool`)

| Attribute | When |
|-----------|------|
| `gen_ai.tool.name` | Always |
| `gen_ai.tool.call_id` | Always |

## Streaming

Streaming calls (`StreamResponse`, `StreamAudio`, `CompleteStream`) are fully traced. The span covers the entire stream lifetime — from the initial call until the channel closes. Response attributes (token usage, finish reason) are recorded when the final event arrives.

## Metrics

Every provider call records two metrics via the global `MeterProvider`:

| Metric | Type | Unit | Description |
|--------|------|------|-------------|
| `gen_ai.client.operation.duration` | Float64Histogram | `s` | Duration of each provider call |
| `gen_ai.client.token.usage` | Int64Counter | `{token}` | Token consumption per call |

Both metrics carry these attributes:

| Attribute | Description |
|-----------|-------------|
| `gen_ai.operation.name` | Operation type (`generate_content`, `generate_embeddings`, `rerank`, etc.) |
| `gen_ai.system` | Provider name (`openai`, `anthropic`, `voyage`, etc.) |
| `gen_ai.request.model` | Model identifier |
| `error.type` | Error message (only on failed calls) |

The token usage counter additionally carries `gen_ai.token.type` (`input` or `output`) to distinguish token direction. Token metrics are only recorded when the count is non-zero.

### Metrics Setup

Metrics work the same as traces — configure a global `MeterProvider`:

```go
import (
    "go.opentelemetry.io/otel"
    sdkmetric "go.opentelemetry.io/otel/sdk/metric"
    "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
)

exporter, _ := otlpmetrichttp.New(ctx)
mp := sdkmetric.NewMeterProvider(sdkmetric.WithReader(
    sdkmetric.NewPeriodicReader(exporter),
))
defer mp.Shutdown(ctx)

otel.SetMeterProvider(mp)
```

## Log Records

LLM calls emit OpenTelemetry log records tied to the active span. Log bodies are structured JSON following GenAI semantic conventions:

| Event Name | Body Structure |
|------------|----------------|
| `gen_ai.system.message` | `{"content": "..."}` |
| `gen_ai.user.message` | `{"content": "..."}` |
| `gen_ai.choice` | `{"index": 0, "content": "...", "finish_reason": "..."}` |

Log records require a global `LoggerProvider` to be configured. Without one, they are silently dropped.

### Content Capture

Message content is **elided by default** for privacy. To include actual message content in log records, set:

```bash
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
```

When disabled, log bodies contain `<elided>` instead of the actual content.

## Retry Visibility

When a provider call is retried (rate limits, transient errors), each retry attempt is recorded as a span event on the `generate_content` span:

```
Event: "retry"
  attempt = 1
  retry_after_ms = 2000
  error = "429 Too Many Requests"
```

This gives visibility into retries without creating additional spans, making it easy to diagnose latency spikes caused by rate limiting.

## OTLP Export

To export traces to Jaeger, Grafana Tempo, Datadog, or any OTLP-compatible backend:

```go
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.36.0"
)

exporter, _ := otlptracehttp.New(ctx)

res, _ := resource.New(ctx,
    resource.WithAttributes(
        semconv.ServiceNameKey.String("my-ai-service"),
        semconv.ServiceVersionKey.String("1.0.0"),
    ),
)

tp := sdktrace.NewTracerProvider(
    sdktrace.WithBatcher(exporter),
    sdktrace.WithResource(res),
)
defer tp.Shutdown(ctx)

otel.SetTracerProvider(tp)
```

Configure the OTLP endpoint via environment variable:

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
```

## Standalone Provider Tracing

Tracing works without the agent framework. Any provider call creates spans and records metrics automatically:

```go
otel.SetTracerProvider(tp)

client, _ := llm.NewLLM(model.ProviderAnthropic,
    llm.WithAPIKey(os.Getenv("ANTHROPIC_API_KEY")),
    llm.WithModel(model.AnthropicModels[model.Claude4Sonnet]),
)

// This call produces a "generate_content claude-sonnet-4-6-20250514" span
// and records duration + token usage metrics
response, _ := client.SendMessages(ctx, messages, nil)
```

The same applies to embeddings, audio, image generation, transcription, rerankers, and FIM.

## Complete Example

```go
package main

import (
    "context"
    "fmt"
    "log"
    "os"

    "go.opentelemetry.io/otel/sdk/resource"
    semconv "go.opentelemetry.io/otel/semconv/v1.36.0"

    "github.com/joakimcarlsson/ai/agent"
    "github.com/joakimcarlsson/ai/model"
    llm "github.com/joakimcarlsson/ai/providers"
    "github.com/joakimcarlsson/ai/tool/functiontool"
    "github.com/joakimcarlsson/ai/tracing"
)

func main() {
    ctx := context.Background()

    res, _ := resource.New(ctx, resource.WithAttributes(
        semconv.ServiceNameKey.String("my-ai-service"),
    ))
    providers, _ := tracing.New(ctx,
        tracing.WithResource(res),
        tracing.WithOTLPEndpoint("localhost:4318"),
    )
    defer func() { _ = providers.Shutdown(ctx) }()

    client, _ := llm.NewLLM(model.ProviderOpenAI,
        llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
        llm.WithModel(model.OpenAIModels[model.GPT5Nano]),
    )

    timeTool := functiontool.New(
        "get_time",
        "Get the current time",
        func(_ context.Context, p struct{}) (string, error) {
            return "14:30 UTC", nil
        },
    )

    myAgent := agent.New(client,
        agent.WithTools(timeTool),
    )

    resp, err := myAgent.Chat(ctx, "What time is it?")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(resp.Content)
}
```

This produces spans:

```
invoke_agent
├── generate_content gpt-5-nano
├── execute_tool get_time
└── generate_content gpt-5-nano
```


---

## Advanced > Configuration

> Source: advanced/configuration.md

# Configuration

## LLM Client Options

```go
client, err := llm.NewLLM(
    model.ProviderOpenAI,
    llm.WithAPIKey("your-key"),
    llm.WithModel(model.OpenAIModels[model.GPT4o]),
    llm.WithMaxTokens(2000),
    llm.WithTemperature(0.7),
    llm.WithTopP(0.9),
    llm.WithTimeout(30*time.Second),
    llm.WithStopSequences("STOP", "END"),
)
```

## Embedding Client Options

```go
embedder, err := embeddings.NewEmbedding(
    model.ProviderVoyage,
    embeddings.WithAPIKey(""),
    embeddings.WithModel(model.VoyageEmbeddingModels[model.Voyage35]),
    embeddings.WithBatchSize(100),
    embeddings.WithTimeout(30*time.Second),
    embeddings.WithVoyageOptions(
        embeddings.WithInputType("document"),
        embeddings.WithOutputDimension(1024),
        embeddings.WithOutputDtype("float"),
    ),
)
```

## Reranker Client Options

```go
reranker, err := rerankers.NewReranker(
    model.ProviderVoyage,
    rerankers.WithAPIKey(""),
    rerankers.WithModel(model.VoyageRerankerModels[model.Rerank25Lite]),
    rerankers.WithTopK(10),
    rerankers.WithReturnDocuments(true),
    rerankers.WithTruncation(true),
    rerankers.WithTimeout(30*time.Second),
)
```

## Image Generation Client Options

```go
// OpenAI/xAI
client, err := image_generation.NewImageGeneration(
    model.ProviderOpenAI,
    image_generation.WithAPIKey("your-key"),
    image_generation.WithModel(model.OpenAIImageGenerationModels[model.DALLE3]),
    image_generation.WithTimeout(60*time.Second),
    image_generation.WithOpenAIOptions(
        image_generation.WithOpenAIBaseURL("custom-endpoint"),
    ),
)

// Gemini
client, err := image_generation.NewImageGeneration(
    model.ProviderGemini,
    image_generation.WithAPIKey("your-key"),
    image_generation.WithModel(model.GeminiImageGenerationModels[model.Imagen4]),
    image_generation.WithTimeout(60*time.Second),
    image_generation.WithGeminiOptions(
        image_generation.WithGeminiBackend(genai.BackendVertexAI),
    ),
)
```

## Audio Generation Client Options

```go
client, err := audio.NewAudioGeneration(
    model.ProviderElevenLabs,
    audio.WithAPIKey("your-key"),
    audio.WithModel(model.ElevenLabsAudioModels[model.ElevenTurboV2_5]),
    audio.WithTimeout(30*time.Second),
    audio.WithElevenLabsOptions(
        audio.WithElevenLabsBaseURL("custom-endpoint"),
    ),
)
```

## Speech-to-Text Client Options

```go
client, err := transcription.NewSpeechToText(
    model.ProviderOpenAI,
    transcription.WithAPIKey("your-key"),
    transcription.WithModel(model.OpenAITranscriptionModels[model.GPT4oTranscribe]),
    transcription.WithTimeout(30*time.Second),
)
```

## Provider-Specific Options

```go
// Anthropic
llm.WithAnthropicOptions(
    llm.WithAnthropicBeta("beta-feature"),
    llm.WithAnthropicBedrock(true),
    llm.WithAnthropicDisableCache(),
    llm.WithAnthropicShouldThinkFn(func(userMsg string) bool {
        return strings.Contains(userMsg, "think")
    }),
)

// OpenAI
llm.WithOpenAIOptions(
    llm.WithOpenAIBaseURL("custom-endpoint"),
    llm.WithOpenAIExtraHeaders(map[string]string{"Custom-Header": "value"}),
    llm.WithOpenAIDisableCache(),
    llm.WithReasoningEffort("high"),                // "low", "medium", "high"
    llm.WithOpenAIFrequencyPenalty(0.5),
    llm.WithOpenAIPresencePenalty(0.3),
    llm.WithOpenAISeed(42),
    llm.WithOpenAIParallelToolCalls(false),
)

// Gemini
llm.WithGeminiOptions(
    llm.WithGeminiDisableCache(),
    llm.WithGeminiFrequencyPenalty(0.5),
    llm.WithGeminiPresencePenalty(0.3),
    llm.WithGeminiSeed(42),
)

// Azure OpenAI
llm.WithAzureOptions(
    llm.WithAzureEndpoint("https://your-resource.openai.azure.com"),
    llm.WithAzureAPIVersion("2024-02-15-preview"),
)

// Bedrock (via Anthropic)
llm.WithAnthropicOptions(
    llm.WithAnthropicBedrock(true),
)
llm.WithBedrockOptions(...)
```

## Retry Configuration

All LLM providers include automatic retry with exponential backoff and jitter. Each provider has optimized defaults:

```go
// Default retry config (used by most providers)
llm.DefaultRetryConfig()   // retries: 429, 500, 502, 503, 504

// Provider-specific configs
llm.OpenAIRetryConfig()     // retries: 429, 500
llm.AnthropicRetryConfig()  // retries: 429, 529
llm.GeminiRetryConfig()     // no Retry-After header support
llm.MistralRetryConfig()    // retries: 429, 500, 502, 503
```

| Setting | Default | Description |
|---------|---------|-------------|
| `MaxRetries` | 3 | Maximum retry attempts |
| `BaseBackoffMs` | 2000 | Initial backoff in milliseconds |
| `JitterPercent` | 0.2 | Jitter added to backoff (20%) |
| `RetryStatusCodes` | varies | HTTP status codes that trigger retries |
| `CheckRetryAfter` | true | Respect the `Retry-After` header |

Retries use exponential backoff: `base * 2^(attempt-1) + jitter`. When `CheckRetryAfter` is enabled and the server sends a `Retry-After` header, that value takes precedence.

## Agent Options

See the [Agent Framework Overview](../agent/overview.md) for a full table of agent configuration options.