Endpoint
Request Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer YOUR_API_KEY |
Content-Type | Yes | application/json |
HTTP-Referer | No | Your application URL (for usage tracking) |
X-Title | No | Your application name (for usage logs) |
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | Model ID in provider/model format. See Models. | |
messages | array | Yes | Conversation messages. Each has role (system, user, assistant) and content. | |
stream | boolean | No | true | Return the response as a stream of Server-Sent Events. |
tools | array | No | Tool/function definitions in OpenAI format. | |
temperature | number | No | Model default | Sampling temperature (0.0 to 2.0). Lower values are more deterministic. |
Message Format
Each message in themessages array has this structure:
| Role | Purpose |
|---|---|
system | Sets the model’s behavior and persona. Place first in the array. |
user | The human’s input. |
assistant | Previous model responses (for multi-turn conversations). |
Multi-Turn Conversation
Include previous messages to maintain context:Streaming Response
Whenstream: true (the default), the response is a series of Server-Sent Events:
data: line contains a JSON chunk. Key fields:
| Field | Description |
|---|---|
id | Generation ID, consistent across all chunks |
choices[0].delta.content | The new text in this chunk |
choices[0].delta.reasoning | Reasoning/thinking content (for reasoning models) |
choices[0].finish_reason | stop when complete, error on failure |
usage | Token counts and cost (included in the final chunk) |
Usage Object
The final chunk includes token usage and cost:| Field | Description |
|---|---|
prompt_tokens | Total input tokens |
completion_tokens | Total output tokens |
prompt_tokens_details.cached_tokens | Tokens served from cache (reduces cost) |
cost | Total cost in USD for this request |
Non-Streaming Response
Whenstream: false, the response is a single JSON object:
Tool Calling
You can define tools that the model can call using the OpenAI function calling format:tool_calls array:
Reasoning Models
Some models support extended thinking (reasoning). When using these models, the response may include reasoning content in the streaming delta:delta.reasoning field. Some providers return encrypted reasoning blocks via delta.reasoning_details that can be passed back in subsequent requests to preserve the reasoning trace.
Not all models support reasoning. See Models for which models have reasoning capabilities.

