LLM API Error Handling: Build Bulletproof AI Applications
Error handling is critical for production-grade AI applications. This guide helps you understand LLM API error types and master professional error handling strategies to ensure your application runs stably.
Common Error Types Explained
Authentication Errors (4xx)
API key is invalid or expired
// Handling solution
if (error.status === 401) {
logger.error('Invalid API key');
// Notify operations to rotate/update key
await notifyOps('API_KEY_INVALID');
// Return a user-friendly error
throw new AuthError('Service temporarily unavailable, please try again later');
}Insufficient permissions or quota exhausted
// Handling solution
if (error.status === 403) {
const reason = error.data?.reason;
if (reason === 'quota_exceeded') {
// Switch to backup account or wait for reset
return await useBackupAccount();
}
}Rate Limit Errors (429)
Request rate exceeded
// Smart retry strategy
async function handleRateLimit(error) {
const retryAfter = error.headers['retry-after'] ||
error.headers['x-ratelimit-reset-after'];
if (retryAfter) {
// Use server-suggested wait time
await sleep(retryAfter * 1000);
} else {
// Exponential backoff algorithm
const backoff = Math.min(
1000 * Math.pow(2, retryCount),
60000 // Max 60s
);
await sleep(backoff + Math.random() * 1000);
}
return retry();
}Server-side Errors (5xx)
Server internal error
Service temporarily unavailable
Business Errors
Context length exceeds the model limit
// Handling solution: intelligent truncation
function truncateContext(messages, maxTokens) {
let totalTokens = 0;
const truncated = [];
// Keep from the newest message first
for (let i = messages.length - 1; i >= 0; i--) {
const tokens = countTokens(messages[i]);
if (totalTokens + tokens <= maxTokens) {
truncated.unshift(messages[i]);
totalTokens += tokens;
} else {
break;
}
}
return truncated;
}Comprehensive Error Handling Architecture
class LLMAPIClient {
constructor(config) {
this.config = config;
this.retryConfig = {
maxRetries: 3,
retryableErrors: [429, 500, 502, 503, 504],
backoffMultiplier: 2,
maxBackoff: 60000
};
}
async callAPI(params) {
let lastError;
for (let attempt = 0; attempt <= this.retryConfig.maxRetries; attempt++) {
try {
// Add timeout control
const response = await this.makeRequest(params, {
timeout: 30000,
signal: AbortSignal.timeout(30000)
});
// Validate response
this.validateResponse(response);
// Record success
this.metrics.recordSuccess(attempt);
return response;
} catch (error) {
lastError = error;
// Log error
this.logError(error, attempt);
// Determine if retryable
if (!this.shouldRetry(error, attempt)) {
throw this.wrapError(error);
}
// Wait before retry
await this.waitBeforeRetry(error, attempt);
// Recovery strategy
params = await this.applyRecoveryStrategy(error, params);
}
}
throw new MaxRetriesError(lastError);
}
shouldRetry(error, attempt) {
// Non-retryable errors
if (error.code === 'invalid_api_key' ||
error.code === 'insufficient_quota') {
return false;
}
// Max retry attempts reached
if (attempt >= this.retryConfig.maxRetries) {
return false;
}
// Check HTTP status code
return this.retryConfig.retryableErrors.includes(error.status);
}
async waitBeforeRetry(error, attempt) {
let delay;
// Prefer server-provided retry time
if (error.retryAfter) {
delay = error.retryAfter * 1000;
} else {
// Exponential backoff + jitter
delay = Math.min(
1000 * Math.pow(this.retryConfig.backoffMultiplier, attempt),
this.retryConfig.maxBackoff
);
delay += Math.random() * 1000; // Add jitter
}
await new Promise(resolve => setTimeout(resolve, delay));
}
async applyRecoveryStrategy(error, params) {
switch (error.code) {
case 'context_length_exceeded':
// Compress context
return {
...params,
messages: this.compressMessages(params.messages)
};
case 'model_overloaded':
// Degrade to a faster model
return {
...params,
model: this.getFallbackModel(params.model)
};
default:
return params;
}
}
}Advanced Error Handling Strategies
🔄 Circuit Breaker Pattern
class CircuitBreaker {
constructor(threshold = 5, timeout = 60000) {
this.failureCount = 0;
this.threshold = threshold;
this.timeout = timeout;
this.state = 'CLOSED';
this.nextAttempt = Date.now();
}
async call(fn) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN');
}
this.state = 'HALF_OPEN';
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
if (this.failureCount >= this.threshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.timeout;
}
}
}🎯 Intelligent Fallback
class FallbackStrategy {
constructor() {
this.modelHierarchy = [
{ name: 'gpt-4', quality: 10, cost: 10 },
{ name: 'gpt-3.5-turbo', quality: 7, cost: 1 },
{ name: 'cached-response', quality: 5, cost: 0 },
{ name: 'static-response', quality: 3, cost: 0 }
];
}
async execute(task) {
for (const model of this.modelHierarchy) {
try {
if (model.name === 'cached-response') {
return await this.getCachedResponse(task);
}
if (model.name === 'static-response') {
return this.getStaticResponse(task);
}
return await this.callModel(model.name, task);
} catch (error) {
console.warn(`Fallback from ${model.name}`, error);
continue;
}
}
throw new Error('All fallback options exhausted');
}
}Error Monitoring and Alerts
Real-time Error Tracking
Monitoring Metrics
- Error rate threshold> 1%
- P99 response time< 5s
- Retry success rate> 80%
- Fallback trigger rate< 5%
Alert Rules
// Alert configuration
const alerts = {
criticalErrorRate: {
condition: 'error_rate > 5%',
window: '5m',
action: 'page_oncall'
},
highLatency: {
condition: 'p99_latency > 10s',
window: '10m',
action: 'notify_team'
},
quotaWarning: {
condition: 'quota_usage > 80%',
window: '1h',
action: 'email_admin'
}
};Error Recovery Best Practices
1. Graceful Degradation
Provide degraded but functional services when the primary service is unavailable.
- • Use cached responses
- • Switch to simplified features
- • Serve static content
- • Defer non-critical operations
2. Fault Isolation
Prevent errors from propagating through the entire system.
- • Use separate error boundaries
- • Isolate different feature modules
- • Implement timeouts
- • Limit concurrent requests
3. Rapid Recovery
Minimize impact and restore services quickly.
- • Automatic failover
- • Health check mechanisms
- • Rollback strategies
- • Disaster recovery plans
Error Handling Checklist
Build Never-Down AI Applications
LLM APIs provide enterprise-grade stability guarantees and comprehensive error handling mechanisms. Combined with professional error handling strategies, your AI application can remain stable under any circumstances.
Learn about reliability guarantees