Skip to main content
Z AI (formerly Zhipu AI) offers the groundbreaking GLM-4.5 series, featuring hybrid reasoning capabilities and agentic AI design. Released in July 2025, these models excel in unified reasoning, coding, and intelligent agent applications while maintaining open-source accessibility under MIT license. Website: https://z.ai/model-api (International) | https://open.bigmodel.cn/ (China)

Getting an API Key

International Users

  1. Sign Up/Sign In: Go to https://z.ai/model-api. Create an account or sign in.
  2. Navigate to API Keys: Access your account dashboard and find the API keys section.
  3. Create a Key: Generate a new API key for your application.
  4. Copy the Key: Copy the API key immediately and store it securely.

China Mainland Users

  1. Sign Up/Sign In: Go to https://open.bigmodel.cn/. Create an account or sign in.
  2. Navigate to API Keys: Access your account dashboard and find the API keys section.
  3. Create a Key: Generate a new API key for your application.
  4. Copy the Key: Copy the API key immediately and store it securely.

Supported Models

Z AI provides different model catalogs based on your selected region:

GLM-4.5 Series

  • GLM-4.5 - Flagship model with 355B total parameters, 32B active parameters
  • GLM-4.5-Air - Compact model with 106B total parameters, 12B active parameters

GLM-4.5 Hybrid Reasoning Models

  • GLM-4.5 (Thinking Mode) - Advanced reasoning with step-by-step analysis
  • GLM-4.5-Air (Thinking Mode) - Efficient reasoning for mainstream hardware
All models feature:
  • 128,000 token context window for extensive document processing
  • Mixture of Experts (MoE) architecture for optimal performance
  • Agent-native design integrating reasoning, coding, and tool usage
  • Open-source availability under MIT license

Configuration in Cline

  1. Open Cline Settings: Click the settings icon (⚙️) in the Cline panel.
  2. Select Provider: Choose “Z AI” from the “API Provider” dropdown.
  3. Select Region: Choose your region:
    • “International” for global access
    • “China” for mainland China access
  4. Enter API Key: Paste your Z AI API key into the “Z AI API Key” field.
  5. Select Model: Choose your desired model from the “Model” dropdown.

GLM Coding Plans

Z AI offers subscription plans specifically designed for coding applications. These plans provide cost-effective access to GLM-4.5 models through a prompt-based structure rather than traditional API usage billing.

Plan Options

GLM Coding Lite - $3/month
  • 120 prompts per 5-hour cycle
  • Access to GLM-4.5 model
  • Works exclusively through coding tools like Cline
GLM Coding Pro - $15/month
  • 600 prompts per 5-hour cycle
  • Access to GLM-4.5 model
  • Works exclusively through coding tools like Cline
Both plans offer promotional pricing for the first month: Lite drops from $6 to $3, Pro drops from $30 to $15.
zAI subscription page showing GLM Coding Lite and Pro plans with pricing

Setting up GLM Coding Plans

To use the GLM Coding Plans with Cline:
  1. Subscribe: Go to https://z.ai/subscribe and choose your plan.
  2. Create API Key: After subscribing, log into your zAI dashboard and create an API key for your coding plan.
  3. Configure in Cline: Open Cline settings, select “Z AI” as your provider, and paste your API key into the “Z AI API Key” field.
Cline settings with zAI provider selected and API key field highlighted
The setup connects your subscription directly to Cline, giving you access to GLM-4.5’s tool-calling capabilities optimized for coding workflows.

Z AI’s Hybrid Intelligence

Z AI’s GLM-4.5 series introduces revolutionary capabilities that set it apart from conventional language models:

Hybrid Reasoning Architecture

GLM-4.5 operates in two distinct modes:
  • Thinking Mode: Designed for complex reasoning tasks and tool usage, engaging in deeper analytical processes
  • Non-Thinking Mode: Provides immediate responses for straightforward queries, optimizing efficiency
This dual-mode architecture represents an “agent-native” design philosophy that adapts processing intensity based on query complexity.

Exceptional Performance

GLM-4.5 achieves a comprehensive score of 63.2 across 12 benchmarks spanning agentic tasks, reasoning, and coding challenges, securing 3rd place among all proprietary and open-source models. GLM-4.5-Air maintains competitive performance with a score of 59.8 while delivering superior efficiency.

Mixture of Experts Excellence

The sophisticated MoE architecture optimizes performance while maintaining computational efficiency:
  • GLM-4.5: 355B total parameters with 32B active parameters
  • GLM-4.5-Air: 106B total parameters with 12B active parameters

Extended Context Capabilities

The 128,000-token context window enables comprehensive understanding of lengthy documents and codebases, with real-world testing confirming effective processing of nearly 2,000-line codebases while maintaining remarkable performance.

Open-Source Leadership

Released under MIT license, GLM-4.5 provides researchers and developers with access to state-of-the-art capabilities without proprietary restrictions, including base models, hybrid reasoning versions, and optimized FP8 variants.

Regional Optimization

API Endpoints

  • International: Uses https://api.z.ai/api/paas/v4
  • China: Uses https://open.bigmodel.cn/api/paas/v4

Model Availability

The region setting determines both API endpoint and available models, with automatic filtering to ensure compatibility with your selected region.

Special Features

Agentic Capabilities

GLM-4.5’s unified architecture makes it particularly suitable for complex intelligent agent applications requiring integrated reasoning, coding, and tool utilization capabilities.

Comprehensive Benchmarking

Performance evaluation encompasses:
  • 3 agentic task benchmarks
  • 7 reasoning benchmarks
  • 2 coding benchmarks
This comprehensive assessment demonstrates versatility across diverse AI applications.

Developer Integration

Models support integration through multiple frameworks:
  • transformers
  • vLLM
  • SGLang
Complete with dedicated model code, tool parser, and reasoning parser implementations.

Performance Comparisons

vs Claude 4 Sonnet

GLM-4.5 shows competitive performance in agentic coding and reasoning tasks, though Claude Sonnet 4 maintains advantages in coding success rates and autonomous multi-feature application development.

vs GPT-4.5

GLM-4.5 ranks competitively in reasoning and agent benchmarks, with GPT-4.5 generally leading in raw task accuracy on professional benchmarks like MMLU and AIME.

Tips and Notes

  • Region Selection: Choose the appropriate region for optimal performance and compliance with local regulations.
  • Model Selection: GLM-4.5 for maximum performance, GLM-4.5-Air for efficiency and mainstream hardware compatibility.
  • Context Advantage: Large 128K context window enables processing of substantial codebases and documents.
  • Open Source Benefits: MIT license enables both commercial use and secondary development.
  • Agentic Applications: Particularly strong for applications requiring reasoning, coding, and tool usage integration.
  • Hybrid Reasoning: Use Thinking Mode for complex problems, Non-Thinking Mode for simple queries.
  • API Compatibility: OpenAI-compatible API provides streaming responses and usage reporting.
  • Framework Support: Multiple integration options available for different deployment scenarios.
I