LLM Providers
Each native LLM vendor is published as its own Go module under llm/. The
client returned from a vendor's NewLLM(...) satisfies the llm.LLM
interface, so once you've constructed it the rest of your code is
vendor-agnostic.
Creating a client
import (
llmopenai "github.com/joakimcarlsson/ai/llm/openai"
"github.com/joakimcarlsson/ai/model"
)
client := llmopenai.NewLLM(
llmopenai.WithAPIKey("your-api-key"),
llmopenai.WithModel(model.OpenAIModels[model.GPT4o]),
llmopenai.WithMaxTokens(1000),
)
For Anthropic instead:
import llmanthropic "github.com/joakimcarlsson/ai/llm/anthropic"
client := llmanthropic.NewLLM(
llmanthropic.WithAPIKey("..."),
llmanthropic.WithModel(model.AnthropicModels[model.Claude45Sonnet]),
llmanthropic.WithMaxTokens(1000),
)
Sending messages
import "github.com/joakimcarlsson/ai/message"
response, err := client.SendMessages(ctx, []message.Message{
message.NewUserMessage("Hello, how are you?"),
}, nil)
fmt.Println(response.Content)
Streaming
import "github.com/joakimcarlsson/ai/types"
stream := client.StreamResponse(ctx, messages, nil)
for event := range stream {
switch event.Type {
case types.EventContentDelta:
fmt.Print(event.Content)
case types.EventComplete:
fmt.Printf("\nTokens: %d in / %d out\n",
event.Response.Usage.InputTokens,
event.Response.Usage.OutputTokens)
case types.EventError:
log.Fatal(event.Error)
}
}
Multimodal (images)
imageData, _ := os.ReadFile("image.png")
msg := message.NewUserMessage("What's in this image?")
msg.AddAttachment(message.Attachment{
MIMEType: "image/png",
Data: imageData,
})
response, err := client.SendMessages(ctx, []message.Message{msg}, nil)
Common options
Every vendor exports the standard set:
llmopenai.WithAPIKey("...")
llmopenai.WithModel(model.OpenAIModels[model.GPT4o])
llmopenai.WithMaxTokens(2000)
llmopenai.WithTemperature(0.7)
llmopenai.WithTopP(0.9)
llmopenai.WithTopK(40)
llmopenai.WithStopSequences("STOP", "END")
llmopenai.WithTimeout(30 * time.Second)
llmopenai.WithToolChoice(llm.ToolChoice{Mode: llm.ToolChoiceRequired})
WithToolChoice
WithToolChoice is shared by the OpenAI, Anthropic, and Gemini modules
(OpenAI-compatible providers inherit it through llm/openai). It takes the
vendor-neutral llm.ToolChoice type: Mode is ToolChoiceAuto (default),
ToolChoiceNone, ToolChoiceRequired, or ToolChoiceSpecific with a Name.
It maps to each provider's native field (tool_choice for OpenAI/Anthropic,
toolConfig.functionCallingConfig for Gemini) and is emitted only when tools
are supplied. ToolChoiceSpecific with an empty Name is rejected before the
request is sent.
WithTopK on the OpenAI client
OpenAI's and Azure's own APIs reject top_k (HTTP 400), so llmopenai.WithTopK
is sent only when a custom base URL points at an OpenAI-compatible provider that
accepts it (Together, OpenRouter, Fireworks, ...); against OpenAI or Azure proper
it has no effect. Native providers (Anthropic, Gemini, Bedrock) honor WithTopK
directly. WithStopSequences sends every sequence provided (the OpenAI client
caps at the API's limit of 4).
Vendor-specific options
OpenAI:
llmopenai.WithBaseURL("https://custom-endpoint")
llmopenai.WithExtraHeaders(map[string]string{"X-My-Header": "value"})
llmopenai.WithReasoningEffort(llmopenai.ReasoningEffortHigh)
llmopenai.WithFrequencyPenalty(0.5)
llmopenai.WithPresencePenalty(0.5)
llmopenai.WithSeed(42)
llmopenai.WithParallelToolCalls(false)
llmopenai.WithLogitBias(map[string]int{"1212": 5, "50256": -100}) // bias/ban tokens by id
llmopenai.WithLogprobs(3) // logprobs:true + top_logprobs:3
llmopenai.WithN(3) // n completions per request
Sampling knobs that change the response shape
WithLogitBias, WithLogprobs, and WithN live on the OpenAI client and so
also cover every OpenAI-compatible provider (Groq, OpenRouter, xAI, Together,
Fireworks, DeepSeek, Mistral, Ollama). They are emitted only when set, and
are OpenAI-only: Anthropic supports none of them, and Gemini's
candidateCount (the n equivalent) is out of scope — those providers never
receive the fields.
WithLogitBiasmaps token IDs (tokenizer ids, OpenAI's wire shape) to a bias from -100 (ban) to 100 (force).WithLogprobs(n)requests per-token log probabilities with up tonalternatives per position; the result lands onResponse.LogProbs([]llm.TokenLogProb), nil when not requested.WithN(n)requestsncompletions; all land onResponse.Choices([]llm.Choice). The top-levelContent/FinishReason/ToolCalls/LogProbsmirror choice 0, so single-completion callers are unaffected (Choicesis empty whennis unset or1). Streaming withn > 1is not supported — use the non-streamingSendMessagespath.
logit_bias is rejected by reasoning-tier models (the gpt-5 family) with an
HTTP 400; use a classic chat model such as gpt-4o-mini when you need it.
Anthropic:
llmanthropic.WithBedrock(true) // route through AWS Bedrock
llmanthropic.WithDisableCache()
llmanthropic.WithReasoningEffort(llmanthropic.ReasoningEffortHigh)
Gemini:
import llmgemini "github.com/joakimcarlsson/ai/llm/gemini"
llmgemini.WithThinkingLevel(llmgemini.ThinkingLevelHigh)
llmgemini.WithFrequencyPenalty(0.5)
llmgemini.WithSeed(42)
Provider built-in tools
Server-side built-in tools (web search, code execution, file search) run
inside the provider's infrastructure. They're opt-in per-client; results land
inline in Response.Content, with structured metadata under
Response.ProviderMetadata. See Tool Calling
for the full picture; below is the per-provider surface.
Anthropic — web_search:
llmanthropic.WithWebSearch(llmanthropic.WebSearchConfig{
MaxUses: 5,
AllowedDomains: []string{"go.dev"},
BlockedDomains: nil,
UserLocation: &llmanthropic.WebSearchUserLocation{
City: "Stockholm", Country: "SE",
},
})
Gemini — google_search, code_execution, url_context:
OpenAI (Responses API) — web_search, file_search, code_interpreter. The
Responses API is a separate surface from Chat Completions; use
NewResponsesLLM instead of NewLLM:
client := llmopenai.NewResponsesLLM(
llmopenai.WithResponsesAPIKey(os.Getenv("OPENAI_API_KEY")),
llmopenai.WithResponsesModel(model.OpenAIModels[model.GPT5]),
llmopenai.WithResponsesMaxTokens(1024),
llmopenai.WithWebSearch(llmopenai.WebSearchOpts{
SearchContextSize: llmopenai.SearchContextMedium,
}),
llmopenai.WithFileSearch("vs_abc123"),
llmopenai.WithCodeInterpreter(),
)
WithWebSearchPreview is also available for models that don't yet support
the newer web_search tool.
Groq — browser_search, code_execution, visit_website (requires a
groq/compound* model via the dedicated NewCompoundLLM):
import llmgroq "github.com/joakimcarlsson/ai/llm/groq"
client := llmgroq.NewCompoundLLM(
llmgroq.WithCompoundAPIKey(os.Getenv("GROQ_API_KEY")),
llmgroq.WithCompoundModel(model.Model{APIModel: "groq/compound"}),
llmgroq.WithBrowserSearch(llmgroq.BrowserSearchOpts{
Country: "us",
IncludeImages: true,
}),
llmgroq.WithCodeExecution(),
llmgroq.WithVisitWebsite(),
)
The regular llmgroq.NewLLM wrapper stays available for OpenAI-compatible
chat without built-ins.
xAI — web_search, x_search, code_execution via the Responses API (use
NewResponsesLLM instead of NewLLM):
import llmxai "github.com/joakimcarlsson/ai/llm/xai"
client := llmxai.NewResponsesLLM(
llmxai.WithResponsesAPIKey(os.Getenv("XAI_API_KEY")),
llmxai.WithResponsesModel(model.XAIModels[model.XAIGrok4]),
llmxai.WithWebSearch(llmxai.WebSearchOpts{
SearchContextSize: llmxai.SearchContextMedium,
}),
llmxai.WithXSearch(llmxai.XSearchOpts{
AllowedXHandles: []string{"xai"},
FromDate: "2026-01-01",
}),
llmxai.WithCodeExecution(),
)
The thin llmxai.NewLLM wrapper remains available for OpenAI-compatible
chat without built-ins.
Cross-vendor wrappers
llm/azure (Azure OpenAI), llm/vertexai (Gemini on Vertex), and
llm/bedrock (Claude on Bedrock) are thin wrappers that delegate to their
underlying vendor module:
import llmazure "github.com/joakimcarlsson/ai/llm/azure"
client := llmazure.NewLLM(
llmazure.WithAPIKey(os.Getenv("AZURE_OPENAI_KEY")),
llmazure.WithEndpoint("https://my-resource.openai.azure.com"),
llmazure.WithDeployment("my-chat-deployment"),
)
import llmbedrock "github.com/joakimcarlsson/ai/llm/bedrock"
// Region is read from $AWS_REGION (or $AWS_DEFAULT_REGION).
client := llmbedrock.NewLLM(
llmbedrock.WithModel(model.AnthropicModels[model.Claude45Sonnet]),
llmbedrock.WithMaxTokens(2000),
)
Prompt caching is on by default on Bedrock: the underlying Anthropic client's
cache_control breakpoints reach Bedrock and populate CacheReadTokens /
CacheCreationTokens in the response usage. Pass llmbedrock.WithDisableCache()
to opt out. (Newer Claude models require at least 4096 cached tokens per
checkpoint before a cache hit is recorded.)
import llmvertex "github.com/joakimcarlsson/ai/llm/vertexai"
client := llmvertex.NewLLM(
llmvertex.WithProject(os.Getenv("VERTEXAI_PROJECT")),
llmvertex.WithLocation(os.Getenv("VERTEXAI_LOCATION")),
llmvertex.WithModel(model.GeminiModels[model.Gemini25Pro]),
)
OpenAI-compatible providers (BYOM)
OpenRouter, Mistral, Ollama, LocalAI, etc. — point llm/openai at the right
base URL:
openrouter := llmopenai.NewLLM(
llmopenai.WithAPIKey(os.Getenv("OPENROUTER_API_KEY")),
llmopenai.WithBaseURL("https://openrouter.ai/api/v1"),
llmopenai.WithModel(model.OpenAIModels[model.GPT5]),
)
Groq and xAI are published as their own modules (llm/groq, llm/xai) so
they can expose vendor-specific built-in tools on top of the OpenAI-compatible
surface. Use the thin NewLLM constructor in each for plain chat, or the
dedicated NewCompoundLLM / NewResponsesLLM for built-in tools.
Berget AI (Swedish, EU-hosted; open-weight models) ships as llm/berget, a
thin wrapper pinned to https://api.berget.ai/v1. Pricing in the model
catalog is in EUR (BergetModels, BergetEmbeddingModels,
BergetRerankerModels, BergetTranscriptionModels):
Like the other OpenAI-compatible wrappers, llm/berget aliases Option but
does not re-export the option constructors; pass the standard ones from
llm/openai:
import (
llmberget "github.com/joakimcarlsson/ai/llm/berget"
llmopenai "github.com/joakimcarlsson/ai/llm/openai"
"github.com/joakimcarlsson/ai/model"
)
client := llmberget.NewLLM(
llmopenai.WithAPIKey(os.Getenv("BERGET_API_KEY")),
llmopenai.WithModel(model.BergetModels[model.BergetGPTOSS120B]),
llmopenai.WithMaxTokens(1000),
)
For a managed registry of these, see BYOM.
Tracing
Every vendor's NewLLM(...) returns a tracing-wrapped client. Spans + metrics
are emitted automatically via OpenTelemetry. See Tracing
for setup.