OpenRouter Implementation Guide: From Setup to Optimization

This comprehensive guide walks you through implementing OpenRouter for maximum cost savings and reliability, from initial setup through advanced optimization strategies. Follow these steps to reduce your AI costs by 20-50% while maintaining or improving service quality.

Prerequisites and Planning

Technical Requirements

Pre-Implementation Assessment

# 1. Audit current AI usage
# Document all current AI integrations
# - Which models are being used?
# - What are typical request volumes?
# - What are current monthly costs?

# 2. Identify integration points
# - Customer-facing applications
# - Internal tools and automation
# - Development and testing environments

Phase 1: Initial Setup (30 minutes)

Step 1: Account Creation and API Key Setup

# 1. Create OpenRouter account
# Visit: https://openrouter.ai/signup
# No credit card required for initial testing

# 2. Generate API key from dashboard
# Navigate to: Settings > API Keys
# Create new key with appropriate permissions

Step 2: Basic Integration Test

// Node.js example - Test basic connectivity
const OpenAI = require('openai');

const openai = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": "https://your-app.com", // Replace with your site
    "X-Title": "Your App Name", // Replace with your app name
  }
});

async function testConnection() {
  try {
    const response = await openai.chat.completions.create({
      model: "openai/gpt-4o-mini", // Start with cost-effective model
      messages: [
        { role: "user", content: "Hello! This is a test message." }
      ],
    });
    
    console.log("✅ OpenRouter connection successful");
    console.log("Response:", response.choices[0].message.content);
    console.log("Usage:", response.usage);
    
    return response;
  } catch (error) {
    console.error("❌ Connection failed:", error.message);
    throw error;
  }
}

testConnection();

Step 3: Environment Configuration

# Environment variables setup
cat > .env << EOF
# OpenRouter Configuration
OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_APP_NAME=Your App Name
OPENROUTER_SITE_URL=https://your-app.com

# Backup configuration (for failover)
OPENAI_API_KEY=your_backup_key
ANTHROPIC_API_KEY=your_backup_key
EOF

Phase 2: Production Integration (1-2 weeks)

Step 1: Drop-in Replacement Implementation

# Python example - Replace existing OpenAI client
import openai
import os
from typing import Optional

class OpenRouterClient:
    def __init__(self):
        self.client = openai.OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=os.getenv("OPENROUTER_API_KEY"),
            default_headers={
                "HTTP-Referer": os.getenv("OPENROUTER_SITE_URL"),
                "X-Title": os.getenv("OPENROUTER_APP_NAME"),
            }
        )
    
    def complete(self, messages, model="openai/gpt-4o-mini", **kwargs):
        """Drop-in replacement for OpenAI chat completions"""
        try:
            response = self.client.chat.completions.create(
                model=model,
                messages=messages,
                **kwargs
            )
            return response
        except Exception as e:
            print(f"OpenRouter request failed: {e}")
            # Implement fallback logic here
            raise

# Usage - minimal code changes required
client = OpenRouterClient()
response = client.complete([
    {"role": "user", "content": "Your message here"}
])

Step 2: Smart Model Selection Strategy

// Implement intelligent model selection
class SmartModelRouter {
  constructor() {
    this.modelTiers = {
      // Free tier - for development and testing
      free: [
        "meta-llama/llama-3.1-8b",
        "google/gemma-2-9b-it",
        "mistralai/mistral-7b-instruct"
      ],
      
      // Budget tier - cost-effective production
      budget: [
        "openai/gpt-4o-mini",
        "anthropic/claude-3-haiku",
        "google/gemini-1.5-flash"
      ],
      
      // Premium tier - highest quality
      premium: [
        "openai/gpt-4o",
        "anthropic/claude-3.5-sonnet",
        "google/gemini-1.5-pro"
      ]
    };
  }
  
  selectModel(taskType, budgetTier = 'budget', qualityRequired = 'medium') {
    const routingRules = {
      'code_generation': {
        low: 'deepseek/deepseek-coder',
        medium: 'openai/gpt-4o-mini',
        high: 'openai/gpt-4o'
      },
      'creative_writing': {
        low: 'meta-llama/llama-3.1-70b',
        medium: 'anthropic/claude-3.5-sonnet',
        high: 'anthropic/claude-3.5-sonnet'
      },
      'summarization': {
        low: 'openai/gpt-4o-mini',
        medium: 'openai/gpt-4o-mini',
        high: 'anthropic/claude-3.5-sonnet'
      },
      'general': {
        low: this.modelTiers.free[0],
        medium: this.modelTiers.budget[0],
        high: this.modelTiers.premium[0]
      }
    };
    
    return routingRules[taskType]?.[qualityRequired] || routingRules.general[qualityRequired];
  }
  
  async makeRequest(messages, taskType, options = {}) {
    const model = this.selectModel(
      taskType, 
      options.budgetTier, 
      options.qualityRequired
    );
    
    console.log(`🎯 Routing ${taskType} task to ${model}`);
    
    const response = await openai.chat.completions.create({
      model: model,
      messages: messages,
      ...options
    });
    
    // Log cost information
    this.logCostData(response, model, taskType);
    
    return response;
  }
  
  logCostData(response, model, taskType) {
    const usage = response.usage;
    console.log(`📊 Request completed:`, {
      model,
      taskType,
      promptTokens: usage.prompt_tokens,
      completionTokens: usage.completion_tokens,
      totalTokens: usage.total_tokens
    });
  }
}

// Usage example
const router = new SmartModelRouter();

// Route different types of requests optimally
const summaryResponse = await router.makeRequest(
  [{ role: "user", content: "Summarize this document..." }],
  "summarization",
  { qualityRequired: "medium" }
);

const codeResponse = await router.makeRequest(
  [{ role: "user", content: "Write a Python function to..." }],
  "code_generation", 
  { qualityRequired: "high" }
);

Step 3: Cost Tracking and Attribution

// Advanced cost tracking implementation
interface CostTracker {
  requestId: string;
  timestamp: Date;
  model: string;
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  estimatedCost: number;
  department: string;
  project: string;
  taskType: string;
}

class OpenRouterCostManager {
  private costs: CostTracker[] = [];
  private budgets: Map<string, number> = new Map();
  
  constructor() {
    // Set department budgets
    this.budgets.set('engineering', 5000);
    this.budgets.set('marketing', 2000);
    this.budgets.set('support', 1000);
  }
  
  async trackRequest(
    response: any,
    model: string,
    department: string,
    project: string,
    taskType: string
  ) {
    const cost: CostTracker = {
      requestId: this.generateId(),
      timestamp: new Date(),
      model,
      promptTokens: response.usage.prompt_tokens,
      completionTokens: response.usage.completion_tokens,
      totalTokens: response.usage.total_tokens,
      estimatedCost: this.calculateCost(response.usage, model),
      department,
      project,
      taskType
    };
    
    this.costs.push(cost);
    
    // Check budget alerts
    this.checkBudgetAlerts(department);
    
    // Log to monitoring system
    this.logToMonitoring(cost);
    
    return cost;
  }
  
  calculateCost(usage: any, model: string): number {
    // Simplified cost calculation - in production, use real-time pricing
    const modelCosts = {
      'openai/gpt-4o-mini': { input: 0.00015, output: 0.0006 },
      'openai/gpt-4o': { input: 0.0025, output: 0.01 },
      'anthropic/claude-3-haiku': { input: 0.00025, output: 0.00125 },
      'anthropic/claude-3.5-sonnet': { input: 0.003, output: 0.015 }
    };
    
    const pricing = modelCosts[model] || { input: 0.001, output: 0.002 };
    return (usage.prompt_tokens * pricing.input + usage.completion_tokens * pricing.output) / 1000;
  }
  
  checkBudgetAlerts(department: string) {
    const departmentSpend = this.getDepartmentSpend(department);
    const budget = this.budgets.get(department) || 0;
    const usage = departmentSpend / budget;
    
    if (usage > 0.8) {
      console.warn(`⚠️ ${department} at ${(usage * 100).toFixed(1)}% of monthly budget`);
      // Implement alert logic (email, Slack, etc.)
      this.sendBudgetAlert(department, usage);
    }
  }
  
  getDepartmentSpend(department: string): number {
    const currentMonth = new Date().getMonth();
    return this.costs
      .filter(c => c.department === department && c.timestamp.getMonth() === currentMonth)
      .reduce((sum, c) => sum + c.estimatedCost, 0);
  }
  
  generateDepartmentReport(department: string) {
    const departmentCosts = this.costs.filter(c => c.department === department);
    const totalSpend = departmentCosts.reduce((sum, c) => sum + c.estimatedCost, 0);
    const requestCount = departmentCosts.length;
    const avgCostPerRequest = totalSpend / requestCount;
    
    return {
      department,
      totalSpend,
      requestCount,
      avgCostPerRequest,
      budget: this.budgets.get(department),
      budgetUtilization: totalSpend / (this.budgets.get(department) || 1),
      topProjects: this.getTopProjects(departmentCosts),
      modelUsage: this.getModelUsage(departmentCosts)
    };
  }
  
  private sendBudgetAlert(department: string, usage: number) {
    // Implement your alert mechanism
    console.log(`🚨 Budget Alert: ${department} department at ${(usage * 100).toFixed(1)}% usage`);
  }
  
  private logToMonitoring(cost: CostTracker) {
    // Send to your monitoring system (DataDog, New Relic, etc.)
    console.log('📈 Cost logged:', cost);
  }
  
  private generateId(): string {
    return Math.random().toString(36).substring(2, 15);
  }
  
  private getTopProjects(costs: CostTracker[]) {
    const projectSpend = costs.reduce((acc, c) => {
      acc[c.project] = (acc[c.project] || 0) + c.estimatedCost;
      return acc;
    }, {} as Record<string, number>);
    
    return Object.entries(projectSpend)
      .sort(([, a], [, b]) => b - a)
      .slice(0, 5);
  }
  
  private getModelUsage(costs: CostTracker[]) {
    const modelUsage = costs.reduce((acc, c) => {
      acc[c.model] = (acc[c.model] || 0) + 1;
      return acc;
    }, {} as Record<string, number>);
    
    return Object.entries(modelUsage)
      .sort(([, a], [, b]) => b - a);
  }
}

Phase 3: Advanced Optimization (2-4 weeks)

Step 1: Implement Intelligent Caching

// Redis-based caching for cost optimization
const redis = require('redis');
const crypto = require('crypto');

class OpenRouterCache {
  constructor() {
    this.client = redis.createClient();
    this.defaultTTL = 3600; // 1 hour
  }
  
  generateCacheKey(messages, model, temperature = 0.7) {
    // Create deterministic cache key
    const content = JSON.stringify({
      messages: messages.map(m => ({ role: m.role, content: m.content })),
      model,
      temperature: Math.round(temperature * 10) / 10 // Round to 1 decimal
    });
    
    return `openrouter:${crypto.createHash('sha256').update(content).digest('hex')}`;
  }
  
  async getCachedResponse(cacheKey) {
    try {
      const cached = await this.client.get(cacheKey);
      if (cached) {
        console.log('✅ Cache hit - saving API cost');
        return JSON.parse(cached);
      }
      return null;
    } catch (error) {
      console.warn('Cache read error:', error.message);
      return null;
    }
  }
  
  async cacheResponse(cacheKey, response, ttl = this.defaultTTL) {
    try {
      await this.client.setEx(cacheKey, ttl, JSON.stringify(response));
      console.log('💾 Response cached successfully');
    } catch (error) {
      console.warn('Cache write error:', error.message);
    }
  }
  
  async makeRequestWithCache(messages, model, options = {}) {
    const { temperature = 0.7, useCache = true, cacheTTL } = options;
    
    // Check cache for deterministic requests (low temperature)
    if (useCache && temperature <= 0.3) {
      const cacheKey = this.generateCacheKey(messages, model, temperature);
      const cached = await this.getCachedResponse(cacheKey);
      
      if (cached) {
        return { ...cached, fromCache: true };
      }
      
      // Make API request
      const response = await openai.chat.completions.create({
        model,
        messages,
        temperature,
        ...options
      });
      
      // Cache the response
      await this.cacheResponse(cacheKey, response, cacheTTL);
      
      return { ...response, fromCache: false };
    }
    
    // Non-cacheable request (high temperature/creativity)
    return await openai.chat.completions.create({
      model,
      messages,
      temperature,
      ...options
    });
  }
}

// Usage
const cache = new OpenRouterCache();

// This request will be cached (deterministic)
const response1 = await cache.makeRequestWithCache(
  [{ role: "user", content: "What is 2+2?" }],
  "openai/gpt-4o-mini",
  { temperature: 0.1, useCache: true }
);

// This request won't be cached (high creativity)
const response2 = await cache.makeRequestWithCache(
  [{ role: "user", content: "Write a creative story about..." }],
  "anthropic/claude-3.5-sonnet",
  { temperature: 0.9, useCache: false }
);

Step 2: Failover and Reliability Implementation

// Robust failover system
class OpenRouterFailover {
  constructor() {
    this.providers = [
      {
        name: 'openrouter',
        client: new OpenAI({
          baseURL: "https://openrouter.ai/api/v1",
          apiKey: process.env.OPENROUTER_API_KEY
        }),
        priority: 1,
        healthy: true
      },
      {
        name: 'direct_openai',
        client: new OpenAI({
          apiKey: process.env.OPENAI_API_KEY
        }),
        priority: 2,
        healthy: true,
        modelMapping: {
          'openai/gpt-4o': 'gpt-4o',
          'openai/gpt-4o-mini': 'gpt-4o-mini'
        }
      }
    ];
    
    this.circuitBreaker = new Map();
  }
  
  async makeResilientRequest(messages, model, options = {}) {
    const { maxRetries = 2, timeoutMs = 30000 } = options;
    
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
      const provider = this.selectProvider(model);
      
      if (!provider) {
        throw new Error('No healthy providers available');
      }
      
      try {
        console.log(`🔄 Attempt ${attempt + 1} using ${provider.name}`);
        
        const mappedModel = provider.modelMapping?.[model] || model;
        
        const response = await this.makeRequestWithTimeout(
          provider.client,
          messages,
          mappedModel,
          options,
          timeoutMs
        );
        
        // Request succeeded, reset circuit breaker
        this.circuitBreaker.delete(provider.name);
        
        return {
          ...response,
          provider: provider.name,
          attempt: attempt + 1
        };
        
      } catch (error) {
        console.warn(`❌ ${provider.name} failed:`, error.message);
        
        // Update circuit breaker
        this.updateCircuitBreaker(provider.name, error);
        
        // Mark provider as unhealthy for certain errors
        if (this.isFatalError(error)) {
          provider.healthy = false;
          console.warn(`🚫 Marking ${provider.name} as unhealthy`);
        }
        
        // If this was the last attempt, throw the error
        if (attempt === maxRetries) {
          throw new Error(`All providers failed. Last error: ${error.message}`);
        }
      }
    }
  }
  
  selectProvider(model) {
    // Filter healthy providers that support the model
    const availableProviders = this.providers
      .filter(p => p.healthy && !this.isCircuitBreakerOpen(p.name))
      .filter(p => this.providerSupportsModel(p, model))
      .sort((a, b) => a.priority - b.priority);
    
    return availableProviders[0] || null;
  }
  
  providerSupportsModel(provider, model) {
    // OpenRouter supports all models
    if (provider.name === 'openrouter') return true;
    
    // Check if direct provider supports the model
    if (provider.modelMapping) {
      return model in provider.modelMapping;
    }
    
    return true; // Assume support if no mapping defined
  }
  
  async makeRequestWithTimeout(client, messages, model, options, timeout) {
    return Promise.race([
      client.chat.completions.create({
        model,
        messages,
        ...options
      }),
      new Promise((_, reject) => 
        setTimeout(() => reject(new Error('Request timeout')), timeout)
      )
    ]);
  }
  
  updateCircuitBreaker(providerName, error) {
    const now = Date.now();
    const failures = this.circuitBreaker.get(providerName) || [];
    
    // Add current failure
    failures.push({ timestamp: now, error: error.message });
    
    // Remove failures older than 5 minutes
    const recentFailures = failures.filter(f => now - f.timestamp < 300000);
    
    this.circuitBreaker.set(providerName, recentFailures);
    
    // Open circuit breaker if too many recent failures
    if (recentFailures.length >= 5) {
      console.warn(`🔌 Circuit breaker opened for ${providerName}`);
    }
  }
  
  isCircuitBreakerOpen(providerName) {
    const failures = this.circuitBreaker.get(providerName) || [];
    const recentFailures = failures.filter(f => Date.now() - f.timestamp < 300000);
    
    return recentFailures.length >= 5;
  }
  
  isFatalError(error) {
    const fatalErrors = [
      'invalid_api_key',
      'account_disabled',
      'billing_hard_limit_reached'
    ];
    
    return fatalErrors.some(fatal => error.message.toLowerCase().includes(fatal));
  }
  
  // Health check method to run periodically
  async healthCheck() {
    for (const provider of this.providers) {
      try {
        await this.makeRequestWithTimeout(
          provider.client,
          [{ role: "user", content: "ping" }],
          provider.name === 'openrouter' ? 'openai/gpt-4o-mini' : 'gpt-4o-mini',
          { max_tokens: 1 },
          5000
        );
        
        provider.healthy = true;
        console.log(`✅ ${provider.name} health check passed`);
        
      } catch (error) {
        console.warn(`❌ ${provider.name} health check failed:`, error.message);
        
        if (this.isFatalError(error)) {
          provider.healthy = false;
        }
      }
    }
  }
}

// Usage with comprehensive error handling
const failover = new OpenRouterFailover();

// Run health checks every 5 minutes
setInterval(() => failover.healthCheck(), 300000);

async function robustAIRequest(messages, model, options = {}) {
  try {
    const response = await failover.makeResilientRequest(messages, model, options);
    console.log(`✅ Request successful via ${response.provider}`);
    return response;
  } catch (error) {
    console.error('🚨 All AI providers failed:', error.message);
    
    // Implement fallback logic (cached response, default response, etc.)
    return {
      choices: [{ 
        message: { 
          content: "I'm temporarily unable to process your request. Please try again later." 
        } 
      }],
      usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 },
      error: true
    };
  }
}

Phase 4: Monitoring and Analytics (1 week)

Step 1: Comprehensive Dashboard Setup

// Dashboard metrics collection
class OpenRouterAnalytics {
  constructor() {
    this.metrics = {
      requests: 0,
      totalCost: 0,
      totalTokens: 0,
      averageLatency: 0,
      errorRate: 0,
      cacheHitRate: 0,
      modelUsage: new Map(),
      departmentCosts: new Map(),
      hourlyUsage: new Array(24).fill(0)
    };
  }
  
  recordRequest(request) {
    this.metrics.requests++;
    this.metrics.totalCost += request.cost || 0;
    this.metrics.totalTokens += request.totalTokens || 0;
    
    // Update model usage
    const modelCount = this.metrics.modelUsage.get(request.model) || 0;
    this.metrics.modelUsage.set(request.model, modelCount + 1);
    
    // Update department costs
    const deptCost = this.metrics.departmentCosts.get(request.department) || 0;
    this.metrics.departmentCosts.set(request.department, deptCost + (request.cost || 0));
    
    // Update hourly usage
    const hour = new Date().getHours();
    this.metrics.hourlyUsage[hour]++;
  }
  
  generateReport() {
    return {
      summary: {
        totalRequests: this.metrics.requests,
        totalCost: this.metrics.totalCost.toFixed(2),
        avgCostPerRequest: (this.metrics.totalCost / this.metrics.requests).toFixed(4),
        totalTokens: this.metrics.totalTokens,
        avgTokensPerRequest: Math.round(this.metrics.totalTokens / this.metrics.requests)
      },
      
      topModels: Array.from(this.metrics.modelUsage.entries())
        .sort(([,a], [,b]) => b - a)
        .slice(0, 5),
        
      departmentBreakdown: Array.from(this.metrics.departmentCosts.entries())
        .sort(([,a], [,b]) => b - a),
        
      usagePatterns: {
        peakHour: this.metrics.hourlyUsage.indexOf(Math.max(...this.metrics.hourlyUsage)),
        hourlyDistribution: this.metrics.hourlyUsage
      },
      
      recommendations: this.generateRecommendations()
    };
  }
  
  generateRecommendations() {
    const recommendations = [];
    
    // Cost optimization recommendations
    if (this.metrics.totalCost / this.metrics.requests > 0.01) {
      recommendations.push({
        type: 'cost_optimization',
        message: 'Consider using more cost-effective models for simpler tasks',
        impact: 'High',
        effort: 'Medium'
      });
    }
    
    // Cache optimization
    if (this.metrics.cacheHitRate < 0.3) {
      recommendations.push({
        type: 'cache_optimization',
        message: 'Implement caching for repetitive queries to reduce costs',
        impact: 'Medium',
        effort: 'Low'
      });
    }
    
    return recommendations;
  }
}

Phase 5: Production Optimization (Ongoing)

Step 1: A/B Testing for Model Selection

// A/B testing framework for model optimization
class ModelABTesting {
  constructor() {
    this.experiments = new Map();
  }
  
  createExperiment(name, models, trafficSplit) {
    this.experiments.set(name, {
      name,
      models,
      trafficSplit, // e.g., [0.5, 0.5] for 50/50 split
      metrics: models.map(() => ({ requests: 0, cost: 0, satisfaction: 0, latency: [] })),
      startTime: Date.now()
    });
  }
  
  selectModelForRequest(experimentName, userId) {
    const experiment = this.experiments.get(experimentName);
    if (!experiment) return null;
    
    // Deterministic assignment based on user ID
    const hash = this.hashUserId(userId);
    let cumulative = 0;
    
    for (let i = 0; i < experiment.trafficSplit.length; i++) {
      cumulative += experiment.trafficSplit[i];
      if (hash < cumulative) {
        return {
          model: experiment.models[i],
          variant: i,
          experimentName
        };
      }
    }
    
    return {
      model: experiment.models[0],
      variant: 0,
      experimentName
    };
  }
  
  recordResult(experimentName, variant, metrics) {
    const experiment = this.experiments.get(experimentName);
    if (!experiment) return;
    
    const variantMetrics = experiment.metrics[variant];
    variantMetrics.requests++;
    variantMetrics.cost += metrics.cost || 0;
    variantMetrics.satisfaction += metrics.satisfaction || 0;
    variantMetrics.latency.push(metrics.latency || 0);
  }
  
  getExperimentResults(experimentName) {
    const experiment = this.experiments.get(experimentName);
    if (!experiment) return null;
    
    const results = experiment.models.map((model, index) => {
      const metrics = experiment.metrics[index];
      return {
        model,
        requests: metrics.requests,
        avgCost: metrics.cost / (metrics.requests || 1),
        avgSatisfaction: metrics.satisfaction / (metrics.requests || 1),
        avgLatency: metrics.latency.reduce((a, b) => a + b, 0) / (metrics.latency.length || 1),
        p95Latency: this.calculatePercentile(metrics.latency, 95)
      };
    });
    
    return {
      experiment: experimentName,
      duration: Date.now() - experiment.startTime,
      results,
      winner: this.determineWinner(results)
    };
  }
  
  hashUserId(userId) {
    // Simple hash function for consistent assignment
    let hash = 0;
    for (let i = 0; i < userId.length; i++) {
      const char = userId.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash;
    }
    return Math.abs(hash) / Math.pow(2, 31);
  }
  
  calculatePercentile(arr, percentile) {
    const sorted = arr.slice().sort((a, b) => a - b);
    const index = (percentile / 100) * (sorted.length - 1);
    return sorted[Math.round(index)];
  }
  
  determineWinner(results) {
    // Simple scoring: balance cost, satisfaction, and latency
    return results.reduce((best, current) => {
      const bestScore = (best.avgSatisfaction * 0.5) + 
                       (1 / best.avgCost * 0.3) + 
                       (1 / best.avgLatency * 0.2);
      const currentScore = (current.avgSatisfaction * 0.5) + 
                          (1 / current.avgCost * 0.3) + 
                          (1 / current.avgLatency * 0.2);
      
      return currentScore > bestScore ? current : best;
    });
  }
}

// Usage example
const abTest = new ModelABTesting();

// Create experiment
abTest.createExperiment(
  'chat_model_comparison',
  ['openai/gpt-4o-mini', 'anthropic/claude-3-haiku'],
  [0.5, 0.5]
);

// In your request handler
async function handleChatRequest(userId, message) {
  const assignment = abTest.selectModelForRequest('chat_model_comparison', userId);
  
  const startTime = Date.now();
  const response = await makeRequest(message, assignment.model);
  const latency = Date.now() - startTime;
  
  // Record results (satisfaction would come from user feedback)
  abTest.recordResult(assignment.experimentName, assignment.variant, {
    cost: response.cost,
    latency,
    satisfaction: 4.5 // From user feedback system
  });
  
  return response;
}

Step 2: Automated Optimization

// Automated cost optimization system
class AutoOptimizer {
  constructor() {
    this.rules = [
      {
        name: 'budget_based_downgrade',
        condition: (context) => context.remainingBudget < 0.2,
        action: (model) => this.downgradeToCheaper(model),
        priority: 1
      },
      {
        name: 'off_hours_optimization',
        condition: (context) => this.isOffHours(),
        action: (model) => this.selectEconomyModel(model),
        priority: 2
      },
      {
        name: 'task_based_optimization',
        condition: (context) => context.taskType === 'simple',
        action: (model) => this.selectSimpleTaskModel(),
        priority: 3
      }
    ];
  }
  
  optimizeRequest(originalModel, context) {
    // Apply optimization rules in priority order
    let optimizedModel = originalModel;
    
    for (const rule of this.rules.sort((a, b) => a.priority - b.priority)) {
      if (rule.condition(context)) {
        optimizedModel = rule.action(optimizedModel);
        console.log(`🎯 Applied rule: ${rule.name} (${originalModel} → ${optimizedModel})`);
        break; // Apply only the highest priority rule
      }
    }
    
    return optimizedModel;
  }
  
  downgradeToCheaper(model) {
    const downgrades = {
      'openai/gpt-4o': 'openai/gpt-4o-mini',
      'anthropic/claude-3.5-sonnet': 'anthropic/claude-3-haiku',
      'google/gemini-1.5-pro': 'google/gemini-1.5-flash'
    };
    
    return downgrades[model] || model;
  }
  
  selectEconomyModel(model) {
    // Use free models during off-hours for non-critical tasks
    const economyModels = [
      'meta-llama/llama-3.1-8b',
      'google/gemma-2-9b-it'
    ];
    
    return economyModels[0];
  }
  
  selectSimpleTaskModel() {
    return 'openai/gpt-4o-mini';
  }
  
  isOffHours() {
    const hour = new Date().getHours();
    return hour < 6 || hour > 22; // 10 PM - 6 AM
  }
}

// Integration with your request system
const optimizer = new AutoOptimizer();

async function optimizedRequest(messages, model, context = {}) {
  const optimizedModel = optimizer.optimizeRequest(model, context);
  
  if (optimizedModel !== model) {
    console.log(`💡 Cost optimization: ${model} → ${optimizedModel}`);
  }
  
  return await openai.chat.completions.create({
    model: optimizedModel,
    messages,
    ...context.options
  });
}

Troubleshooting Common Issues

API Key and Authentication Issues

# Common error: Invalid API key
# Solution: Verify key in OpenRouter dashboard
curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     -H "Content-Type: application/json" \
     "https://openrouter.ai/api/v1/models"

# Common error: Rate limiting
# Solution: Implement exponential backoff

Model Selection and Availability

// Handle model unavailability gracefully
async function robustModelRequest(messages, preferredModel, fallbacks = []) {
  const modelsToTry = [preferredModel, ...fallbacks];
  
  for (const model of modelsToTry) {
    try {
      console.log(`🎯 Trying model: ${model}`);
      
      const response = await openai.chat.completions.create({
        model,
        messages,
        max_tokens: 1000
      });
      
      console.log(`✅ Success with model: ${model}`);
      return response;
      
    } catch (error) {
      console.warn(`❌ Model ${model} failed: ${error.message}`);
      
      if (error.message.includes('model_not_found')) {
        continue; // Try next model
      } else {
        throw error; // Don't retry for other errors
      }
    }
  }
  
  throw new Error('All models failed or unavailable');
}

// Usage
const response = await robustModelRequest(
  messages,
  'openai/gpt-4o', // Preferred
  ['openai/gpt-4o-mini', 'anthropic/claude-3-haiku'] // Fallbacks
);

Performance Optimization

// Request batching for high-volume scenarios
class RequestBatcher {
  constructor(maxBatchSize = 10, maxWaitTime = 1000) {
    this.maxBatchSize = maxBatchSize;
    this.maxWaitTime = maxWaitTime;
    this.queue = [];
    this.timeoutId = null;
  }
  
  async addRequest(messages, model, options = {}) {
    return new Promise((resolve, reject) => {
      this.queue.push({
        messages,
        model,
        options,
        resolve,
        reject,
        timestamp: Date.now()
      });
      
      // Process batch if it's full or timeout
      if (this.queue.length >= this.maxBatchSize) {
        this.processBatch();
      } else if (!this.timeoutId) {
        this.timeoutId = setTimeout(() => this.processBatch(), this.maxWaitTime);
      }
    });
  }
  
  async processBatch() {
    if (this.timeoutId) {
      clearTimeout(this.timeoutId);
      this.timeoutId = null;
    }
    
    const batch = this.queue.splice(0, this.maxBatchSize);
    
    console.log(`🔄 Processing batch of ${batch.length} requests`);
    
    // Process all requests concurrently
    const promises = batch.map(async (request) => {
      try {
        const response = await openai.chat.completions.create({
          model: request.model,
          messages: request.messages,
          ...request.options
        });
        request.resolve(response);
      } catch (error) {
        request.reject(error);
      }
    });
    
    await Promise.allSettled(promises);
  }
}

// Usage
const batcher = new RequestBatcher(5, 500); // Max 5 requests, 500ms wait

// These requests will be batched together
const response1 = batcher.addRequest(messages1, 'openai/gpt-4o-mini');
const response2 = batcher.addRequest(messages2, 'openai/gpt-4o-mini');
// ... more requests

Conclusion and Next Steps

Following this implementation guide will help you achieve significant cost savings (typically 20-50%) while improving reliability and gaining better visibility into your AI usage patterns.

Key Success Metrics to Track

  1. Start Small: Begin with development environments and non-critical workloads
  2. Monitor Closely: Track costs and performance for the first 30 days
  3. Optimize Gradually: Implement caching, A/B testing, and auto-optimization features
  4. Scale Up: Move production workloads after validation
  5. Continuous Improvement: Regular review and optimization of routing strategies

Additional Resources

Remember: The key to successful OpenRouter implementation is gradual rollout with careful monitoring and optimization. Start conservative and optimize based on real usage data.