Skip to content

Reasoning / Extended Thinking

Models that support chain-of-thought reasoning can be configured to control reasoning depth. Some providers also expose the model's thinking process via EventThinkingDelta streaming events.

Reasoning effort

Each LLM vendor module exports its own WithReasoningEffort (or WithThinkingLevel) helper.

import llmopenai "github.com/joakimcarlsson/ai/llm/openai"

client := llmopenai.NewLLM(
    llmopenai.WithAPIKey("your-key"),
    llmopenai.WithModel(model.OpenAIModels[model.O4Mini]),
    llmopenai.WithMaxTokens(16000),
    llmopenai.WithReasoningEffort(llmopenai.ReasoningEffortHigh),
)
Level Constant
Low llmopenai.ReasoningEffortLow
Medium llmopenai.ReasoningEffortMedium
High llmopenai.ReasoningEffortHigh

OpenAI's Chat Completions API does not expose thinking content. The model reasons internally but EventThinkingDelta events are not emitted.

import llmanthropic "github.com/joakimcarlsson/ai/llm/anthropic"

client := llmanthropic.NewLLM(
    llmanthropic.WithAPIKey("your-key"),
    llmanthropic.WithModel(model.AnthropicModels[model.Claude45Sonnet]),
    llmanthropic.WithMaxTokens(16000),
    llmanthropic.WithReasoningEffort(llmanthropic.ReasoningEffortHigh),
)
Level Constant
Low llmanthropic.ReasoningEffortLow
Medium llmanthropic.ReasoningEffortMedium
High llmanthropic.ReasoningEffortHigh
Max llmanthropic.ReasoningEffortMax
import llmgemini "github.com/joakimcarlsson/ai/llm/gemini"

client := llmgemini.NewLLM(
    llmgemini.WithAPIKey("your-key"),
    llmgemini.WithModel(model.GeminiModels[model.Gemini3Pro]),
    llmgemini.WithMaxTokens(16000),
    llmgemini.WithThinkingLevel(llmgemini.ThinkingLevelHigh),
)
Level Constant
Minimal llmgemini.ThinkingLevelMinimal
Low llmgemini.ThinkingLevelLow
Medium llmgemini.ThinkingLevelMedium
High llmgemini.ThinkingLevelHigh

Streaming thinking events

Anthropic, Gemini, and OpenAI-compatible providers (Ollama, vLLM, etc.) that expose reasoning deltas stream thinking content via EventThinkingDelta:

import "github.com/joakimcarlsson/ai/types"

for event := range client.StreamResponse(ctx, messages, nil) {
    switch event.Type {
    case types.EventThinkingDelta:
        fmt.Print(event.Thinking)
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Printf("\nTokens: %d in, %d out\n",
            event.Response.Usage.InputTokens,
            event.Response.Usage.OutputTokens,
        )
    case types.EventError:
        log.Fatal(event.Error)
    }
}

The same pattern works with agents via ChatStream:

for event := range myAgent.ChatStream(ctx, "Think about this carefully...") {
    switch event.Type {
    case types.EventThinkingDelta:
        fmt.Print(event.Thinking)
    case types.EventContentDelta:
        fmt.Print(event.Content)
    }
}

OpenAI-compatible providers (Ollama, vLLM)

Reasoning models served via OpenAI-compatible APIs (Qwen, DeepSeek, etc.) stream thinking content over the same reasoning delta channel. Use llm/openai with a custom base URL and a custom model:

import (
    llmopenai "github.com/joakimcarlsson/ai/llm/openai"
    "github.com/joakimcarlsson/ai/model"
)

ollama := llmopenai.NewLLM(
    llmopenai.WithAPIKey("ollama"),
    llmopenai.WithBaseURL("http://localhost:11434/v1"),
    llmopenai.WithModel(model.Model{
        ID:               "qwen3:14b",
        Name:             "Qwen3 14B",
        APIModel:         "qwen3:14b",
        Provider:         model.ProviderOpenAI,
        ContextWindow:    32768,
        DefaultMaxTokens: 4096,
        CanReason:        true,
    }),
    llmopenai.WithMaxTokens(4096),
)