Baseten provides on-demand frontier model APIs designed for production applications, not just experimentation. Built on the Baseten Inference Stack, these APIs deliver enterprise-grade performance and reliability with optimized inference for leading open-source models from OpenAI, DeepSeek, Meta, Moonshot AI, and Alibaba Cloud. Website: https://www.baseten.co/products/model-apis/

Getting an API Key

  1. Sign Up/Sign In: Go to Baseten and create an account or sign in.
  2. Navigate to API Keys: Access your dashboard and go to the API Keys section.
  3. Create a Key: Generate a new API key. Give it a descriptive name (e.g., “Cline”).
  4. Copy the Key: Copy the API key immediately and store it securely.

Supported Models

Cline supports all current models under Baseten Model APIs, including: For the most updated pricing, please visit: https://www.baseten.co/products/model-apis/ Reasoning Models:
  • deepseek-ai/DeepSeek-R1 - DeepSeek’s first-generation reasoning model (163K context) - $2.55/$5.95 per 1M tokens
  • deepseek-ai/DeepSeek-R1-0528 - Latest revision of DeepSeek’s reasoning model (163K context) - $2.55/$5.95 per 1M tokens
  • deepseek-ai/DeepSeek-V3.1 - Hybrid reasoning with advanced tool calling (163K context) - $0.50/$1.50 per 1M tokens
  • deepseek-ai/DeepSeek-V3-0324 - Fast general-purpose with enhanced reasoning (163K context) - $0.77/$0.77 per 1M tokens
Flagship Models:
  • openai/gpt-oss-120b (OpenAI) - 120B MoE with strong reasoning capabilities (128K context) - $0.10/$0.50 per 1M tokens
  • moonshotai/Kimi-K2-Instruct (Moonshot AI) - 1 trillion parameter model for agentic tasks (131K context) - $0.60/$2.50 per 1M tokens
  • moonshotai/Kimi-K2-Instruct-0905 (Moonshot AI) - September update with enhanced capabilities (262K context) - $0.60/$2.50 per 1M tokens
Meta Llama 4 Series:
  • meta-llama/Llama-4-Maverick-17B-128E-Instruct - High-efficiency processing (1M context!) - $0.19/$0.72 per 1M tokens
  • meta-llama/Llama-4-Scout-17B-16E-Instruct - Precise context understanding (1M context!) - $0.13/$0.50 per 1M tokens
Coding Specialists:
  • Qwen/Qwen3-Coder-480B-A35B-Instruct- Advanced coding and reasoning (262K context) - $0.38/$1.53 per 1M tokens
  • Qwen/Qwen3-235B-A22B-Instruct-2507 - Math and reasoning expert (262K context) - $0.22/$0.80 per 1M tokens

Configuration in Cline

  1. Open Cline Settings: Click the settings icon (⚙️) in the Cline panel.
  2. Select Provider: Choose “Baseten” from the “API Provider” dropdown.
  3. Enter API Key: Paste your Baseten API key into the “Baseten API Key” field.
  4. Select Model: Choose your desired model from the “Model” dropdown.

Production-First Architecture

Baseten’s Model APIs are built for production environments with several key advantages:

Enterprise-Grade Reliability

  • Four nines of uptime (99.99%) through active-active redundancy
  • Cloud-agnostic, multi-cluster autoscaling for consistent availability
  • SOC 2 Type II certified and HIPAA compliant for security requirements

Optimized Performance

  • Pre-optimized models shipped with the Baseten Inference Stack
  • Latest-generation GPUs with multi-cloud infrastructure
  • Ultra-fast inference optimized from the bottom up for production workloads

Cost Efficiency

  • 5-10x less expensive than closed alternatives
  • Optimized multi-cloud infrastructure for efficient resource utilization
  • Transparent pricing with no hidden costs or rate limit surprises

Developer Experience

  • OpenAI compatible API - migrate by swapping a single URL
  • Drop-in replacement for closed models with comprehensive observability
  • Seamless scaling from Model APIs to dedicated deployments

Special Features

Function Calling & Tool Use

All Baseten models support structured outputs, function calling, and tool use as part of the Baseten Inference Stack, making them ideal for agentic applications.

Reasoning Capabilities

DeepSeek models offer enhanced reasoning with step-by-step thought processes, while maintaining production-ready performance.

Long Context Support

  • Up to 1 million tokens for Llama 4 models (Maverick and Scout)
  • 262K tokens for Qwen3 models
  • 163K tokens for DeepSeek models
  • Perfect for code repositories and complex multi-turn conversations

Quantization Optimizations

Models are deployed with advanced quantization techniques (fp4, fp8, fp16) for optimal performance while maintaining quality.

Migration from Other Providers

Baseten’s OpenAI compatibility makes migration straightforward: From OpenAI:
  • Swap api.openai.com with inference.baseten.co/v1
  • Keep existing request/response formats
  • Benefit from significant cost savings
From Other Providers:
  • Use standard OpenAI SDK format
  • Maintain existing prompting strategies
  • Access to newer open-source models

Tips and Notes

  • Model Selection: Choose models based on your specific use case - reasoning models for complex tasks, coding models for development work, and flagship models for general applications.
  • Cost Optimization: Baseten offers some of the most competitive pricing in the market, especially for open-source models.
  • Context Windows: Take advantage of large context windows (up to 1M tokens) for including substantial codebases and documentation.
  • Enterprise Ready: Baseten is designed for production use with enterprise-grade security, compliance, and reliability.
  • Dynamic Model Updates: Cline automatically fetches the latest model list from Baseten, ensuring access to new models as they’re released.
  • Multi-Cloud Capacity Management (MCM): Baseten’s multi-cloud infrastructure ensures high availability and low latency globally.
  • Support: Baseten provides dedicated support for production deployments and can work with you on dedicated resources as you scale.

Pricing Information

Current pricing is highly competitive and transparent. For the most up-to-date pricing, visit the Baseten Model APIs page. Prices typically range from $0.10-$6.00 per million tokens, making Baseten significantly more cost-effective than many closed-model alternatives while providing access to state-of-the-art open-source models.