Batch Processing

Process bulk LLM and embedding requests efficiently using provider-native batch APIs or bounded concurrent execution.

Native Batch APIs

Native batch APIs submit all requests as a single job that processes asynchronously on the provider side. Providers may offer reduced pricing for batch workloads (see the provider support table below for details). Results are typically returned within 24 hours, often much faster.

OpenAI

import (
    "github.com/joakimcarlsson/ai/batch"
    "github.com/joakimcarlsson/ai/model"
)

proc, _ := batch.New(
    model.ProviderOpenAI,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.OpenAIModels[model.GPT4o]),
    batch.WithPollInterval(30 * time.Second),
)

requests := []batch.Request{
    {
        ID:   "q1",
        Type: batch.RequestTypeChat,
        Messages: []message.Message{
            message.NewUserMessage("What is the capital of France?"),
        },
    },
    {
        ID:   "q2",
        Type: batch.RequestTypeChat,
        Messages: []message.Message{
            message.NewUserMessage("What is the capital of Japan?"),
        },
    },
}

resp, err := proc.Process(ctx, requests)
for _, r := range resp.Results {
    if r.Err != nil {
        fmt.Printf("[%s] Error: %v\n", r.ID, r.Err)
        continue
    }
    fmt.Printf("[%s] %s\n", r.ID, r.ChatResponse.Content)
}

Anthropic

proc, _ := batch.New(
    model.ProviderAnthropic,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.AnthropicModels[model.Claude4Sonnet]),
    batch.WithMaxTokens(1024),
    batch.WithPollInterval(30 * time.Second),
)

Gemini / Vertex AI

proc, _ := batch.New(
    model.ProviderGemini,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.GeminiModels[model.Gemini25Flash]),
    batch.WithPollInterval(30 * time.Second),
)

Concurrent Fallback

For providers without native batch APIs, pass an existing LLM client. Requests run concurrently with a configurable concurrency limit.

client, _ := llm.NewLLM(model.ProviderGroq,
    llm.WithAPIKey("your-api-key"),
    llm.WithModel(model.GroqModels[model.Llama4Scout]),
)

proc, _ := batch.New(
    model.ProviderGroq,
    batch.WithLLM(client),
    batch.WithMaxConcurrency(10),
)

resp, _ := proc.Process(ctx, requests)

Batch Embeddings

embedder, _ := embeddings.NewEmbedding(model.ProviderVoyage,
    embeddings.WithAPIKey("your-api-key"),
    embeddings.WithModel(model.VoyageEmbeddingModels[model.Voyage35]),
)

proc, _ := batch.New(
    model.ProviderVoyage,
    batch.WithEmbedding(embedder),
    batch.WithMaxConcurrency(5),
)

requests := []batch.Request{
    {ID: "doc1", Type: batch.RequestTypeEmbedding, Texts: []string{"first document"}},
    {ID: "doc2", Type: batch.RequestTypeEmbedding, Texts: []string{"second document"}},
}

resp, _ := proc.Process(ctx, requests)

Provider Support

Provider	Native Batch	Discount (as of writing)	Supported Endpoints
OpenAI	✅	50%	Chat, Embeddings
Anthropic	✅	50%	Messages
Gemini	✅	50%	Content, Embeddings
Vertex AI	✅	~50%	Content, Embeddings
All others	Concurrent fallback	—	Chat, Embeddings

Progress Tracking

Callback

proc, _ := batch.New(
    model.ProviderOpenAI,
    batch.WithAPIKey("your-api-key"),
    batch.WithModel(model.OpenAIModels[model.GPT4o]),
    batch.WithProgressCallback(func(p batch.Progress) {
        fmt.Printf("%d/%d completed, %d failed [%s]\n",
            p.Completed, p.Total, p.Failed, p.Status)
    }),
)

Async Channel

ch, err := proc.ProcessAsync(ctx, requests)

for event := range ch {
    switch event.Type {
    case batch.EventItem:
        fmt.Printf("[%s] done\n", event.Result.ID)
    case batch.EventProgress:
        fmt.Printf("%d/%d\n", event.Progress.Completed, event.Progress.Total)
    case batch.EventComplete:
        fmt.Println("all done")
    case batch.EventError:
        fmt.Printf("batch error: %v\n", event.Err)
    }
}

Error Handling

Individual request failures never fail the batch. Each result carries its own error.

resp, err := proc.Process(ctx, requests)

for _, r := range resp.Results {
    if r.Err != nil {
        continue
    }
    // use r.ChatResponse or r.EmbedResponse
}

fmt.Printf("Completed: %d, Failed: %d\n", resp.Completed, resp.Failed)

Options

Option	Description	Default
`WithAPIKey(key)`	API key for native batch providers	—
`WithModel(model)`	LLM model for chat batch requests	—
`WithEmbeddingModel(model)`	Embedding model for embedding batch requests	—
`WithMaxTokens(n)`	Max tokens per request	4096
`WithLLM(client)`	Existing LLM client for concurrent fallback	—
`WithEmbedding(client)`	Existing embedding client for concurrent fallback	—
`WithMaxConcurrency(n)`	Max parallel requests in concurrent mode	10
`WithProgressCallback(fn)`	Progress update callback	—
`WithPollInterval(d)`	Polling interval for native batch APIs	30s
`WithTimeout(d)`	Request timeout	—
`WithOpenAIOptions(...)`	OpenAI-specific options (base URL, headers)	—
`WithGeminiOptions(...)`	Gemini-specific options (backend)	—