Vision (Multimodal Images)

Send images to LLMs for analysis using URL references or raw binary data. Works with any provider that supports multimodal input (Anthropic, OpenAI, Gemini).

Image from URL

import "github.com/joakimcarlsson/ai/message"

msg := message.NewUserMessage("What do you see in this image?")
msg.AddImageURL("https://example.com/photo.jpg", "")

response, err := client.SendMessages(ctx, []message.Message{msg}, nil)
fmt.Println(response.Content)

The second argument to AddImageURL is an optional detail level ("low", "high", or "" for auto).

Image from Binary Data

imageData, _ := os.ReadFile("photo.jpg")

msg := message.NewUserMessage("Describe this image.")
msg.AddBinary("image/jpeg", imageData)

response, err := client.SendMessages(ctx, []message.Message{msg}, nil)
fmt.Println(response.Content)

Multiple Images

msg := message.NewUserMessage("Compare these two images.")
msg.AddImageURL("https://example.com/before.jpg", "")
msg.AddImageURL("https://example.com/after.jpg", "")

response, err := client.SendMessages(ctx, []message.Message{msg}, nil)

MultiModalMessage

For full control, build messages with the MultiModalMessage type directly:

msg := message.NewUserMultiModalMessage([]message.MultiModalContent{
    message.NewTextContent("What's in this image?"),
    message.NewImageURLContent("https://example.com/photo.jpg", "high"),
})

// Or with attachments
msg := message.NewUserMultiModalMessageWithAttachments(
    "Describe these files.",
    []message.Attachment{
        {MIMEType: "image/png", Data: pngData},
        {MIMEType: "image/jpeg", Data: jpegData},
    },
)

Content Types

Type	Constructor	Description
`text`	`NewTextContent(text)`	Text content
`image_url`	`NewImageURLContent(url, detail)`	Image from URL
`binary`	`NewBinaryContent(mimeType, data)`	Raw binary data (base64-encoded for the provider)

Supported Formats

Most providers accept JPEG, PNG, GIF, and WebP. Check your provider's documentation for size limits.