> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cline.bot/llms.txt
> Use this file to discover all available pages before exploring further.

# Z AI (Zhipu AI)

> Learn how to configure and use Z AI's GLM models with Cline. Experience advanced hybrid reasoning, agentic capabilities, and open-source excellence with regional optimization.

Z AI (formerly Zhipu AI) offers the GLM model series, featuring hybrid reasoning capabilities and agentic AI design. These models excel in unified reasoning, coding, and intelligent agent applications while maintaining open-source accessibility under MIT license.

**Website:** [https://z.ai/model-api](https://z.ai/model-api) (International) | [https://open.bigmodel.cn/](https://open.bigmodel.cn/) (China)

### Getting an API Key

#### International Users

1. **Sign Up/Sign In:** Go to [https://z.ai/model-api](https://z.ai/model-api). Create an account or sign in.
2. **Navigate to API Keys:** Access your account dashboard and find the API keys section.
3. **Create a Key:** Generate a new API key for your application.
4. **Copy the Key:** Copy the API key immediately and store it securely.

#### China Mainland Users

1. **Sign Up/Sign In:** Go to [https://open.bigmodel.cn/](https://open.bigmodel.cn/). Create an account or sign in.
2. **Navigate to API Keys:** Access your account dashboard and find the API keys section.
3. **Create a Key:** Generate a new API key for your application.
4. **Copy the Key:** Copy the API key immediately and store it securely.

### Supported Models

Z AI provides different model catalogs based on your selected region. Both regions share the same model lineup:

#### GLM-5.1 (Latest)

* `glm-5.1` (Default) - Latest flagship model with 200K context window, 128K maximum output, and prompt caching ($1.40/$4.40 per 1M tokens; cached input \$0.26 per 1M tokens)

#### GLM-5

* `glm-5` - Flagship model with 200K context window and prompt caching ($1.00/$3.20 per 1M tokens)

#### GLM-4.7

* `glm-4.7` - High-performance model with 200K context and prompt caching ($0.60/$2.20 per 1M tokens)

#### GLM-4.6

* `glm-4.6` - Advanced model with 200K context and prompt caching ($0.60/$2.20 per 1M tokens)

#### GLM-4.5 Series

* `glm-4.5` - Flagship model with 131K context, prompt caching, and hybrid reasoning
* `glm-4.5-air` - Compact, cost-effective model with 128K context and prompt caching

All models feature:

* **Prompt caching support** for reduced costs on repeated queries
* **Mixture of Experts (MoE) architecture** for optimal performance
* **Agent-native design** integrating reasoning, coding, and tool usage
* **Open-source availability** under MIT license

**Note:** Pricing differs between International and China regions. China region pricing is approximately 50% lower.

### Configuration in Cline

1. **Open Cline Settings:** Click the settings icon (⚙️) in the Cline panel.
2. **Select Provider:** Choose "Z AI" from the "API Provider" dropdown.
3. **Select Region:** Choose your region:
   * "International" for global access
   * "China" for mainland China access
4. **Enter API Key:** Paste your Z AI API key into the "Z AI API Key" field.
5. **Select Model:** Choose your desired model from the "Model" dropdown.

### GLM Coding Plans

Z AI offers subscription plans specifically designed for coding applications. These plans provide cost-effective access to GLM-4.5 models through a prompt-based structure rather than traditional API usage billing.

#### Plan Options

**GLM Coding Lite** - \$3/month

* 120 prompts per 5-hour cycle
* Access to GLM-4.5 model
* Works exclusively through coding tools like Cline

**GLM Coding Pro** - \$15/month

* 600 prompts per 5-hour cycle
* Access to GLM-4.5 model
* Works exclusively through coding tools like Cline

Both plans offer promotional pricing for the first month: Lite drops from \$6 to \$3, Pro drops from \$30 to \$15.

<Frame>
  <img src="https://storage.googleapis.com/cline_public_images/docs/assets/zAI-coding-plan.png" alt="zAI subscription page showing GLM Coding Lite and Pro plans with pricing" />
</Frame>

#### Setting up GLM Coding Plans

To use the GLM Coding Plans with Cline:

1. **Subscribe:** Go to [https://z.ai/subscribe](https://z.ai/subscribe) and choose your plan.

2. **Create API Key:** After subscribing, log into your zAI dashboard and create an API key for your coding plan.

3. **Configure in Cline:** Open Cline settings, select "Z AI" as your provider, and paste your API key into the "Z AI API Key" field.

<Frame>
  <img src="https://storage.googleapis.com/cline_public_images/docs/assets/zAI-provider.png" alt="Cline settings with zAI provider selected and API key field highlighted" />
</Frame>

The setup connects your subscription directly to Cline, giving you access to GLM-4.5's tool-calling capabilities optimized for coding workflows.

### Z AI's Hybrid Intelligence

Z AI's GLM-4.5 series introduces revolutionary capabilities that set it apart from conventional language models:

#### Hybrid Reasoning Architecture

GLM-4.5 operates in two distinct modes:

* **Thinking Mode:** Designed for complex reasoning tasks and tool usage, engaging in deeper analytical processes
* **Non-Thinking Mode:** Provides immediate responses for straightforward queries, optimizing efficiency

This dual-mode architecture represents an "agent-native" design philosophy that adapts processing intensity based on query complexity.

#### Exceptional Performance

GLM-4.5 achieves a comprehensive score of **63.2** across 12 benchmarks spanning agentic tasks, reasoning, and coding challenges, securing **3rd place** among all proprietary and open-source models. GLM-4.5-Air maintains competitive performance with a score of **59.8** while delivering superior efficiency.

#### Mixture of Experts Excellence

The sophisticated MoE architecture optimizes performance while maintaining computational efficiency:

* **GLM-4.5:** 355B total parameters with 32B active parameters
* **GLM-4.5-Air:** 106B total parameters with 12B active parameters

#### Extended Context Capabilities

The 128,000-token context window enables comprehensive understanding of lengthy documents and codebases, with real-world testing confirming effective processing of nearly 2,000-line codebases while maintaining remarkable performance.

#### Open-Source Leadership

Released under MIT license, GLM-4.5 provides researchers and developers with access to state-of-the-art capabilities without proprietary restrictions, including base models, hybrid reasoning versions, and optimized FP8 variants.

### Regional Optimization

#### API Endpoints

* **International:** Uses `https://api.z.ai/api/paas/v4`
* **China:** Uses `https://open.bigmodel.cn/api/paas/v4`

#### Model Availability

The region setting determines both API endpoint and available models, with automatic filtering to ensure compatibility with your selected region.

### Special Features

#### Agentic Capabilities

GLM-4.5's unified architecture makes it particularly suitable for complex intelligent agent applications requiring integrated reasoning, coding, and tool utilization capabilities.

#### Comprehensive Benchmarking

Performance evaluation encompasses:

* **3 agentic task benchmarks**
* **7 reasoning benchmarks**
* **2 coding benchmarks**

This comprehensive assessment demonstrates versatility across diverse AI applications.

#### Developer Integration

Models support integration through multiple frameworks:

* **transformers**
* **vLLM**
* **SGLang**

Complete with dedicated model code, tool parser, and reasoning parser implementations.

### Performance Comparisons

#### vs Claude 4 Sonnet

GLM-4.5 shows competitive performance in agentic coding and reasoning tasks, though Claude Sonnet 4 maintains advantages in coding success rates and autonomous multi-feature application development.

#### vs GPT-4.5

GLM-4.5 ranks competitively in reasoning and agent benchmarks, with GPT-4.5 generally leading in raw task accuracy on professional benchmarks like MMLU and AIME.

### Tips and Notes

* **Region Selection:** Choose the appropriate region for optimal performance and compliance with local regulations.
* **Model Selection:** GLM-4.5 for maximum performance, GLM-4.5-Air for efficiency and mainstream hardware compatibility.
* **Context Advantage:** Large 128K context window enables processing of substantial codebases and documents.
* **Open Source Benefits:** MIT license enables both commercial use and secondary development.
* **Agentic Applications:** Particularly strong for applications requiring reasoning, coding, and tool usage integration.
* **Hybrid Reasoning:** Use Thinking Mode for complex problems, Non-Thinking Mode for simple queries.
* **API Compatibility:** OpenAI-compatible API provides streaming responses and usage reporting.
* **Framework Support:** Multiple integration options available for different deployment scenarios.
