Endpoint
Request Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer YOUR_API_KEY |
Content-Type | Yes | application/json |
HTTP-Referer | No | Your application URL (for usage tracking) |
X-Title | No | Your application name (for usage logs) |
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | Model ID in provider/model format. See Models. | |
messages | array | Yes | Conversation messages. Each has role (system, user, assistant) and content. | |
stream | boolean | No | true | Return the response as a stream of Server-Sent Events. |
tools | array | No | Tool/function definitions in OpenAI format. | |
temperature | number | No | Model default | Sampling temperature (0.0 to 2.0). Lower values are more deterministic. |
Message Format
Each message in themessages array has this structure:
| Role | Purpose |
|---|---|
system | Sets the model’s behavior and persona. Place first in the array. |
user | The human’s input. |
assistant | Previous model responses (for multi-turn conversations). |
Multi-Turn Conversation
Include previous messages to maintain context:Streaming Response
Whenstream: true (the default), the response is a series of Server-Sent Events:
data: line contains a JSON chunk. Key fields:
| Field | Description |
|---|---|
id | Generation ID, consistent across all chunks |
choices[0].delta.content | The new text in this chunk |
choices[0].delta.reasoning | Reasoning/thinking content (for reasoning models) |
choices[0].finish_reason | stop when complete, error on failure |
usage | Token counts and cost (included in the final chunk) |
Usage Object
The final chunk includes token usage and cost:| Field | Description |
|---|---|
prompt_tokens | Total input tokens |
completion_tokens | Total output tokens |
prompt_tokens_details.cached_tokens | Tokens served from cache (reduces cost) |
cost | Total cost in USD for this request |
Non-Streaming Response
Whenstream: false, the response is a single JSON object:
Tool Calling
You can define tools that the model can call using the OpenAI function calling format:tool_calls array:
Reasoning Models
Some models support extended thinking (reasoning). When using these models, the response may include reasoning content in the streaming delta:delta.reasoning field. Some providers return encrypted reasoning blocks via delta.reasoning_details that can be passed back in subsequent requests to preserve the reasoning trace.
Not all models support reasoning. See Models for which models have reasoning capabilities.
Complete Example
Related
Models
Browse available models and their capabilities.
Errors
Handle errors and implement retry logic.
SDK Examples
Use this endpoint from Python, Node.js, and more.
Authentication
API key management and security practices.

