LLM API Error Handling: Build Bulletproof AI Applications

Error handling is critical for production-grade AI applications. This guide helps you understand LLM API error types and master professional error handling strategies to ensure your application runs stably.

Common Error Types Explained

Authentication Errors (4xx)

401 Unauthorized

API key is invalid or expired

Non-retryable
// Handling solution
if (error.status === 401) {
  logger.error('Invalid API key');
  // Notify operations to rotate/update key
  await notifyOps('API_KEY_INVALID');
  // Return a user-friendly error
  throw new AuthError('Service temporarily unavailable, please try again later');
}
403 Forbidden

Insufficient permissions or quota exhausted

Check quota
// Handling solution
if (error.status === 403) {
  const reason = error.data?.reason;
  if (reason === 'quota_exceeded') {
    // Switch to backup account or wait for reset
    return await useBackupAccount();
  }
}

Rate Limit Errors (429)

429 Too Many Requests

Request rate exceeded

Retryable
// Smart retry strategy
async function handleRateLimit(error) {
  const retryAfter = error.headers['retry-after'] || 
                     error.headers['x-ratelimit-reset-after'];
  
  if (retryAfter) {
    // Use server-suggested wait time
    await sleep(retryAfter * 1000);
  } else {
    // Exponential backoff algorithm
    const backoff = Math.min(
      1000 * Math.pow(2, retryCount),
      60000 // Max 60s
    );
    await sleep(backoff + Math.random() * 1000);
  }
  
  return retry();
}

Server-side Errors (5xx)

500 Internal Server Error

Server internal error

Retryable
503 Service Unavailable

Service temporarily unavailable

Degradable

Business Errors

context_length_exceeded

Context length exceeds the model limit

// Handling solution: intelligent truncation
function truncateContext(messages, maxTokens) {
  let totalTokens = 0;
  const truncated = [];
  
  // Keep from the newest message first
  for (let i = messages.length - 1; i >= 0; i--) {
    const tokens = countTokens(messages[i]);
    if (totalTokens + tokens <= maxTokens) {
      truncated.unshift(messages[i]);
      totalTokens += tokens;
    } else {
      break;
    }
  }
  
  return truncated;
}

Comprehensive Error Handling Architecture

class LLMAPIClient {
  constructor(config) {
    this.config = config;
    this.retryConfig = {
      maxRetries: 3,
      retryableErrors: [429, 500, 502, 503, 504],
      backoffMultiplier: 2,
      maxBackoff: 60000
    };
  }

  async callAPI(params) {
    let lastError;
    
    for (let attempt = 0; attempt <= this.retryConfig.maxRetries; attempt++) {
      try {
        // Add timeout control
        const response = await this.makeRequest(params, {
          timeout: 30000,
          signal: AbortSignal.timeout(30000)
        });
        
        // Validate response
        this.validateResponse(response);
        
        // Record success
        this.metrics.recordSuccess(attempt);
        
        return response;
        
      } catch (error) {
        lastError = error;
        
        // Log error
        this.logError(error, attempt);
        
        // Determine if retryable
        if (!this.shouldRetry(error, attempt)) {
          throw this.wrapError(error);
        }
        
        // Wait before retry
        await this.waitBeforeRetry(error, attempt);
        
        // Recovery strategy
        params = await this.applyRecoveryStrategy(error, params);
      }
    }
    
    throw new MaxRetriesError(lastError);
  }

  shouldRetry(error, attempt) {
    // Non-retryable errors
    if (error.code === 'invalid_api_key' || 
        error.code === 'insufficient_quota') {
      return false;
    }
    
    // Max retry attempts reached
    if (attempt >= this.retryConfig.maxRetries) {
      return false;
    }
    
    // Check HTTP status code
    return this.retryConfig.retryableErrors.includes(error.status);
  }

  async waitBeforeRetry(error, attempt) {
    let delay;
    
    // Prefer server-provided retry time
    if (error.retryAfter) {
      delay = error.retryAfter * 1000;
    } else {
      // Exponential backoff + jitter
      delay = Math.min(
        1000 * Math.pow(this.retryConfig.backoffMultiplier, attempt),
        this.retryConfig.maxBackoff
      );
      delay += Math.random() * 1000; // Add jitter
    }
    
    await new Promise(resolve => setTimeout(resolve, delay));
  }

  async applyRecoveryStrategy(error, params) {
    switch (error.code) {
      case 'context_length_exceeded':
        // Compress context
        return {
          ...params,
          messages: this.compressMessages(params.messages)
        };
        
      case 'model_overloaded':
        // Degrade to a faster model
        return {
          ...params,
          model: this.getFallbackModel(params.model)
        };
        
      default:
        return params;
    }
  }
}

Advanced Error Handling Strategies

🔄 Circuit Breaker Pattern

class CircuitBreaker {
  constructor(threshold = 5, timeout = 60000) {
    this.failureCount = 0;
    this.threshold = threshold;
    this.timeout = timeout;
    this.state = 'CLOSED';
    this.nextAttempt = Date.now();
  }

  async call(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN');
      }
      this.state = 'HALF_OPEN';
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failureCount++;
    if (this.failureCount >= this.threshold) {
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.timeout;
    }
  }
}

🎯 Intelligent Fallback

class FallbackStrategy {
  constructor() {
    this.modelHierarchy = [
      { name: 'gpt-4', quality: 10, cost: 10 },
      { name: 'gpt-3.5-turbo', quality: 7, cost: 1 },
      { name: 'cached-response', quality: 5, cost: 0 },
      { name: 'static-response', quality: 3, cost: 0 }
    ];
  }

  async execute(task) {
    for (const model of this.modelHierarchy) {
      try {
        if (model.name === 'cached-response') {
          return await this.getCachedResponse(task);
        }
        
        if (model.name === 'static-response') {
          return this.getStaticResponse(task);
        }
        
        return await this.callModel(model.name, task);
      } catch (error) {
        console.warn(`Fallback from ${model.name}`, error);
        continue;
      }
    }
    
    throw new Error('All fallback options exhausted');
  }
}

Error Monitoring and Alerts

Real-time Error Tracking

Monitoring Metrics

  • Error rate threshold> 1%
  • P99 response time< 5s
  • Retry success rate> 80%
  • Fallback trigger rate< 5%

Alert Rules

// Alert configuration
const alerts = {
  criticalErrorRate: {
    condition: 'error_rate > 5%',
    window: '5m',
    action: 'page_oncall'
  },
  highLatency: {
    condition: 'p99_latency > 10s',
    window: '10m',
    action: 'notify_team'
  },
  quotaWarning: {
    condition: 'quota_usage > 80%',
    window: '1h',
    action: 'email_admin'
  }
};

Error Recovery Best Practices

1. Graceful Degradation

Provide degraded but functional services when the primary service is unavailable.

  • • Use cached responses
  • • Switch to simplified features
  • • Serve static content
  • • Defer non-critical operations

2. Fault Isolation

Prevent errors from propagating through the entire system.

  • • Use separate error boundaries
  • • Isolate different feature modules
  • • Implement timeouts
  • • Limit concurrent requests

3. Rapid Recovery

Minimize impact and restore services quickly.

  • • Automatic failover
  • • Health check mechanisms
  • • Rollback strategies
  • • Disaster recovery plans

Error Handling Checklist

Build Never-Down AI Applications

LLM APIs provide enterprise-grade stability guarantees and comprehensive error handling mechanisms. Combined with professional error handling strategies, your AI application can remain stable under any circumstances.

Learn about reliability guarantees