BYOM (Bring Your Own Model)

Use Ollama, LocalAI, vLLM, LM Studio, or any OpenAI-compatible inference server. After the modular refactor, BYOM is just llm/openai with a custom base URL — no separate code path. The llm module also ships a tiny config registry for organising multiple BYOM endpoints.

Direct setup

import (
    llmopenai "github.com/joakimcarlsson/ai/llm/openai"
    "github.com/joakimcarlsson/ai/model"
)

llamaModel := model.NewCustomModel(
    model.WithModelID("llama3.2"),
    model.WithAPIModel("llama3.2:latest"),
    model.WithContextWindow(128_000),
)

client := llmopenai.NewLLM(
    llmopenai.WithBaseURL("http://localhost:11434/v1"),
    llmopenai.WithModel(llamaModel),
    llmopenai.WithMaxTokens(2000),
)

response, _ := client.SendMessages(ctx, messages, nil)

That's the whole story for a single endpoint.

Registry helper (multiple endpoints)

When you've got several BYOM configurations and want to pass them around as opaque IDs rather than re-typing URLs, use the llm module's registry:

import "github.com/joakimcarlsson/ai/llm"

ollama := llm.RegisterCustomProvider("ollama", llm.CustomProviderConfig{
    BaseURL:      "http://localhost:11434/v1",
    DefaultModel: llamaModel,
})

// Later, in some other part of your code:
cfg, ok := llm.GetCustomProvider(ollama)
if !ok {
    log.Fatal("unknown provider")
}

client := llmopenai.NewLLM(
    llmopenai.WithBaseURL(cfg.BaseURL),
    llmopenai.WithExtraHeaders(cfg.ExtraHeaders),
    llmopenai.WithModel(cfg.DefaultModel),
)

The registry stores config; you still construct the client explicitly with llmopenai.NewLLM(...). There's no implicit factory that dispatches based on provider ID — the modular refactor removed that, so callers know exactly which vendor module they're invoking.

Custom model options

customModel := model.NewCustomModel(
    model.WithModelID("my-model"),
    model.WithAPIModel("my-model-v1"),         // sent in API requests
    model.WithName("My Custom Model"),         // human-readable
    model.WithProvider("my-provider"),         // provider identifier
    model.WithContextWindow(131_072),
    model.WithDefaultMaxTokens(4096),
    model.WithStructuredOutput(true),
    model.WithAttachments(true),
    model.WithReasoning(true),
    model.WithImageGeneration(false),
    model.WithCostPer1MIn(1.50),
    model.WithCostPer1MOut(5.00),
    model.WithCostPer1MInCached(0.15),
    model.WithCostPer1MOutCached(2.50),
)

Option	Description	Default
`WithModelID(id)`	Unique identifier	`""`
`WithAPIModel(name)`	Model name sent in API requests	`""`
`WithName(name)`	Human-readable display name	`""`
`WithProvider(provider)`	Provider identifier	`"custom"`
`WithContextWindow(tokens)`	Max input context size	`0`
`WithDefaultMaxTokens(tokens)`	Recommended max output tokens	`0`
`WithStructuredOutput(bool)`	Enable structured JSON output	`false`
`WithAttachments(bool)`	Enable image/file inputs	`false`
`WithReasoning(bool)`	Enable chain-of-thought	`false`
`WithImageGeneration(bool)`	Enable image generation	`false`
`WithCostPer1MIn(cost)`	Input token cost per million	`0`
`WithCostPer1MOut(cost)`	Output token cost per million	`0`
`WithCostPer1MInCached(cost)`	Cached input token cost	`0`
`WithCostPer1MOutCached(cost)`	Cached output token cost	`0`

Setting these correctly enables features like structured output, context strategies, and cost tracking against your custom model.

Extra headers

For tenant headers, custom auth, etc.:

client := llmopenai.NewLLM(
    llmopenai.WithBaseURL("https://my-service.com/v1"),
    llmopenai.WithExtraHeaders(map[string]string{
        "X-API-Tenant":  "my-tenant",
        "Authorization": "Bearer my-token",
    }),
    llmopenai.WithModel(customModel),
)

Streaming

Streaming works the same as any other vendor:

stream := client.StreamResponse(ctx, messages, nil)

for event := range stream {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Println()
    case types.EventError:
        log.Fatal(event.Error)
    }
}

Supported servers

Any server that implements the OpenAI-compatible /chat/completions API:

Ollama — http://localhost:11434/v1
LocalAI — http://localhost:8080/v1
vLLM — http://localhost:8000/v1
LM Studio — http://localhost:1234/v1
Groq — https://api.groq.com/openai/v1 (cloud, OpenAI-compatible)
OpenRouter — https://openrouter.ai/api/v1 (cloud, multi-vendor proxy)
xAI — https://api.x.ai/v1
Mistral — https://api.mistral.ai/v1