Skip to content

BYOM (Bring Your Own Model)

Use Ollama, LocalAI, vLLM, LM Studio, or any OpenAI-compatible inference server. After the modular refactor, BYOM is just llm/openai with a custom base URL — no separate code path. The llm module also ships a tiny config registry for organising multiple BYOM endpoints.

Direct setup

import (
    llmopenai "github.com/joakimcarlsson/ai/llm/openai"
    "github.com/joakimcarlsson/ai/model"
)

llamaModel := model.NewCustomModel(
    model.WithModelID("llama3.2"),
    model.WithAPIModel("llama3.2:latest"),
    model.WithContextWindow(128_000),
)

client := llmopenai.NewLLM(
    llmopenai.WithBaseURL("http://localhost:11434/v1"),
    llmopenai.WithModel(llamaModel),
    llmopenai.WithMaxTokens(2000),
)

response, _ := client.SendMessages(ctx, messages, nil)

That's the whole story for a single endpoint.

Registry helper (multiple endpoints)

When you've got several BYOM configurations and want to pass them around as opaque IDs rather than re-typing URLs, use the llm module's registry:

import "github.com/joakimcarlsson/ai/llm"

ollama := llm.RegisterCustomProvider("ollama", llm.CustomProviderConfig{
    BaseURL:      "http://localhost:11434/v1",
    DefaultModel: llamaModel,
})

// Later, in some other part of your code:
cfg, ok := llm.GetCustomProvider(ollama)
if !ok {
    log.Fatal("unknown provider")
}

client := llmopenai.NewLLM(
    llmopenai.WithBaseURL(cfg.BaseURL),
    llmopenai.WithExtraHeaders(cfg.ExtraHeaders),
    llmopenai.WithModel(cfg.DefaultModel),
)

The registry stores config; you still construct the client explicitly with llmopenai.NewLLM(...). There's no implicit factory that dispatches based on provider ID — the modular refactor removed that, so callers know exactly which vendor module they're invoking.

Custom model options

customModel := model.NewCustomModel(
    model.WithModelID("my-model"),
    model.WithAPIModel("my-model-v1"),         // sent in API requests
    model.WithName("My Custom Model"),         // human-readable
    model.WithProvider("my-provider"),         // provider identifier
    model.WithContextWindow(131_072),
    model.WithDefaultMaxTokens(4096),
    model.WithStructuredOutput(true),
    model.WithAttachments(true),
    model.WithReasoning(true),
    model.WithImageGeneration(false),
    model.WithCostPer1MIn(1.50),
    model.WithCostPer1MOut(5.00),
    model.WithCostPer1MInCached(0.15),
    model.WithCostPer1MOutCached(2.50),
)
Option Description Default
WithModelID(id) Unique identifier ""
WithAPIModel(name) Model name sent in API requests ""
WithName(name) Human-readable display name ""
WithProvider(provider) Provider identifier "custom"
WithContextWindow(tokens) Max input context size 0
WithDefaultMaxTokens(tokens) Recommended max output tokens 0
WithStructuredOutput(bool) Enable structured JSON output false
WithAttachments(bool) Enable image/file inputs false
WithReasoning(bool) Enable chain-of-thought false
WithImageGeneration(bool) Enable image generation false
WithCostPer1MIn(cost) Input token cost per million 0
WithCostPer1MOut(cost) Output token cost per million 0
WithCostPer1MInCached(cost) Cached input token cost 0
WithCostPer1MOutCached(cost) Cached output token cost 0

Setting these correctly enables features like structured output, context strategies, and cost tracking against your custom model.

Extra headers

For tenant headers, custom auth, etc.:

client := llmopenai.NewLLM(
    llmopenai.WithBaseURL("https://my-service.com/v1"),
    llmopenai.WithExtraHeaders(map[string]string{
        "X-API-Tenant":  "my-tenant",
        "Authorization": "Bearer my-token",
    }),
    llmopenai.WithModel(customModel),
)

Streaming

Streaming works the same as any other vendor:

stream := client.StreamResponse(ctx, messages, nil)

for event := range stream {
    switch event.Type {
    case types.EventContentDelta:
        fmt.Print(event.Content)
    case types.EventComplete:
        fmt.Println()
    case types.EventError:
        log.Fatal(event.Error)
    }
}

Supported servers

Any server that implements the OpenAI-compatible /chat/completions API:

  • Ollamahttp://localhost:11434/v1
  • LocalAIhttp://localhost:8080/v1
  • vLLMhttp://localhost:8000/v1
  • LM Studiohttp://localhost:1234/v1
  • Groqhttps://api.groq.com/openai/v1 (cloud, OpenAI-compatible)
  • OpenRouterhttps://openrouter.ai/api/v1 (cloud, multi-vendor proxy)
  • xAIhttps://api.x.ai/v1
  • Mistralhttps://api.mistral.ai/v1