LLM API Best Practices: Development Guide from Zero to Pro

This guide compiles LLM API development best practices and lessons learned to help developers avoid common pitfalls and build high-quality, high-performance AI applications.

Prompt Engineering Best Practices

Golden Rules for High-Quality Prompts

1. Clear and Specific Instructions

❌ Bad Example

write an article about AI

βœ… Good Example

Please write an 800-word technical blog post
Topic: How enterprises can use LLM API to improve efficiency
Target readers: Technical decision-makers
Include: 1) Application scenarios 2) ROI analysis 3) Implementation suggestions

2. Structured Output

const prompt = `Please analyze the following customer feedback and output JSON in the format below:
{
  "sentiment": "positive/negative/neutral",
  "category": "product/service/price/other",
  "priority": "high/medium/low",
  "summary": "Summary within 50 words",
  "suggestions": ["Suggestion 1", "Suggestion 2"]
}

Feedback: ${feedback}`;

3. Few-shot Learning

const prompt = `Classify each sentence into: Technical Issue / Account Issue / Suggestion

Examples:
Sentence: My API key is not working
Label: Account Issue

Sentence: Could you add a Python SDK?
Label: Suggestion

Sentence: The model response is very slow
Label: Technical Issue

Sentence: ${userInput}
Label:`;

Cost Optimization Strategies

Practical Tips to Reduce API Costs

πŸ“Š Token Optimization

  • β€’
    Shorter prompts: Remove redundant descriptions
  • β€’
    Context compression: Keep only necessary history
  • β€’
    Limit output: Set the max_tokens parameter

🎯 Model Selection

  • β€’
    Tiered usage: Use smaller models for simple tasks
  • β€’
    Hybrid strategy: Route to different models
  • β€’
    Batch processing: Merge similar requests

Cost Monitoring Code Examples

class CostTracker:
    def __init__(self):
        self.usage = {"gpt-4": 0, "gpt-3.5": 0}
        self.costs = {"gpt-4": 0.03, "gpt-3.5": 0.002}
    
    def track(self, model, tokens):
        self.usage[model] += tokens
        cost = (tokens / 1000) * self.costs[model]
        
        if cost > 0.1:  # Single request exceeds $0.10
            self.alert_high_cost(model, tokens, cost)
        
        return cost

Error Handling Best Practices

Build a Robust Error Handling Mechanism

Comprehensive Error Handling Example

async function callLLMAPI(prompt, retries = 3) {
  const errors = [];
  
  for (let i = 0; i < retries; i++) {
    try {
      const response = await llmClient.complete({
        model: "gpt-4",
        messages: [{ role: "user", content: prompt }],
        temperature: 0.7,
        timeout: 30000  // 30s timeout
      });
      
      // Validate response
      if (!response.choices?.[0]?.message?.content) {
        throw new Error("Invalid response format");
      }
      
      return response;
      
    } catch (error) {
      errors.push(error);
      
      // Handle error types
      if (error.code === 'rate_limit_exceeded') {
        const waitTime = error.retry_after || Math.pow(2, i) * 1000;
        await sleep(waitTime);
        continue;
      }
      
      if (error.code === 'context_length_exceeded') {
        // Compress context and retry
        prompt = compressPrompt(prompt);
        continue;
      }
      
      if (error.code === 'service_unavailable') {
        // Fallback to backup model
        return await fallbackModel(prompt);
      }
      
      // Non-retryable
      if (['invalid_api_key', 'invalid_request'].includes(error.code)) {
        throw error;
      }
    }
  }
  
  // All retries failed
  throw new AggregateError(errors, 'All retries failed');
}

Error Categories

  • Retryable: Timeout, rate limiting, temporary service unavailability
  • Degradable: Model unavailable, slow responses
  • Fix required: Context too long, format errors
  • Non-recoverable: Authentication failure, insufficient balance

Fallback Strategies

  • β€’ GPT-4 β†’ GPT-3.5 β†’ Claude
  • β€’ Complex prompts β†’ Simplified prompts
  • β€’ Real-time generation β†’ Cached results
  • β€’ API calls β†’ Local model

Performance Optimization Tips

Speed Up Your Application

πŸš€ Streaming

// Display generated content in real time
const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: messages,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(
    chunk.choices[0]?.delta?.content || ""
  );
}

πŸ’Ύ Smart Caching

// Semantic similarity cache
const cache = new SemanticCache({
  threshold: 0.95,
  ttl: 3600
});

const cached = await cache.get(prompt);
if (cached) return cached;

const result = await llm.complete(prompt);
await cache.set(prompt, result);

⚑ Concurrency Optimization

// Batch concurrent requests
const results = await Promise.all(
  prompts.map(prompt => 
    llm.complete(prompt)
  )
);

// Limit concurrency
const results = await pLimit(5)(
  prompts.map(p => () => llm.complete(p))
);

Security Best Practices

Protect Your AI Application

πŸ” API Key Management

❌ Don’t do this

const API_KEY = "sk-xxxxx"; // Hardcoded
git add .  // Committed to repo

βœ… Do this instead

// Use environment variables
const API_KEY = process.env.LLM_API_KEY;
// Or a secret manager service
const API_KEY = await secretManager.get('llm-key');

πŸ›‘οΈ Input Validation and Filtering

function sanitizeInput(userInput) {
  // Length limit
  if (userInput.length > 1000) {
    throw new Error("Input too long");
  }
  
  // Remove potential prompt injections
  const forbidden = [
    'ignore previous instructions',
    'system:', 
    '```python'
  ];
  
  for (const pattern of forbidden) {
    if (userInput.toLowerCase().includes(pattern)) {
      throw new Error("Invalid input detected");
    }
  }
  
  // Content moderation
  if (containsSensitiveContent(userInput)) {
    throw new Error("Content policy violation");
  }
  
  return userInput;
}

Monitoring and Debugging

Build a Complete Monitoring System

Key Metrics

  • API response timeP95 < 3s
  • Error rate< 0.1%
  • Token usageReal-time tracking
  • Cost consumptionHourly statistics

Logging Best Practices

logger.info('LLM API Request', {
  requestId: uuid(),
  model: 'gpt-4',
  promptTokens: 150,
  timestamp: Date.now(),
  userId: user.id,
  // Never log full prompts
  promptHash: hash(prompt),
  promptLength: prompt.length
});

Development Workflow Tips

From Dev to Production

1

Prototyping

  • β€’ Use the Playground to test quickly
  • β€’ Record effective prompt templates
  • β€’ Evaluate different model performance
2

Testing & Optimization

  • β€’ Build a test dataset
  • β€’ A/B test different strategies
  • β€’ Optimize cost and performance
3

Production

  • β€’ Implement comprehensive error handling
  • β€’ Configure monitoring and alerts
  • β€’ Prepare fallback strategies

Common Pitfalls and Solutions

Pitfall: Over-reliance on a Single Model

Solution: Adopt a multi-model strategy and establish fallbacks to avoid single points of failure.

Pitfall: Neglecting Cost Control

Solution: Set budget caps, track usage in real time, and optimize prompt length.

Pitfall: Poor Context Management

Solution: Use a sliding window strategy, keep only relevant history, and clean context regularly.

Pitfall: Unstable Outputs

Solution: Lower the temperature, use structured outputs, and add validation logic.

Start Your Best Practices Journey

LLM API provides comprehensive documentation, code examples, and technical support to help you quickly master LLM API best practices and build stable, efficient, and secure AI applications.

Start Free Trial