Streaming Response API

Use Server-Sent Events to implement real-time streaming output, allowing users to see generated content immediately

SSEReal-time ResponseAdvanced Feature

What is Streaming Response?

Streaming response allows models to return results in real-time while generating content, rather than waiting for the complete response before returning. This greatly improves user experience, especially when generating long texts.

❌ Traditional Method

• Wait for complete response (may take several seconds)
• User experience delay
• Timeout risk

✅ Streaming Response

• Start displaying content immediately
• Real-time UI updates
• Better user experience

Enable Streaming Response

Set stream: true in the request to enable streaming response.

Request Example

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Write an article about Artificial Intelligence"
    }
  ],
  "stream": true  // Enable streaming response
}

Streaming Response Format

Streaming response uses Server-Sent Events (SSE) format, with each data chunk starting with data:.

SSE Data Format

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"Artificial"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":" Intelligence"},"finish_reason":null}]}

data: [DONE]

Important Field Descriptions

• delta: contains incremental content
• finish_reason: completion reason (stop/length/function_call)
• [DONE]: indicates stream end

ImplementExample

PythonImplement

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.n1n.ai/v1"
)

# Create streaming response
stream = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

# Handle streaming response
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

JavaScriptImplement

async function streamChat(message) {
  const response = await fetch('https://api.n1n.ai/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: JSON.stringify({
      model: 'gpt-3.5-turbo',
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') {
          console.log('Stream finished');
          return;
        }
        
        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices[0].delta.content;
          if (content) {
            process.stdout.write(content);
          }
        } catch (e) {
          console.error('Error parsing:', e);
        }
      }
    }
  }
}

ReactComponentExample

import { useState } from 'react';

function ChatComponent() {
  const [messages, setMessages] = useState([]);
  const [isStreaming, setIsStreaming] = useState(false);

  async function sendMessage(content) {
    setIsStreaming(true);
    const newMessage = { role: 'assistant', content: '' };
    setMessages(prev => [...prev, { role: 'user', content }, newMessage]);

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: content, stream: true })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      // Parse and update messages
      setMessages(prev => {
        const msgs = [...prev];
        msgs[msgs.length - 1].content += chunk;
        return msgs;
      });
    }
    
    setIsStreaming(false);
  }

  return (
    <div>
      {messages.map((msg, i) => (
        <div key={i} className={msg.role}>
          {msg.content}
        </div>
      ))}
      {isStreaming && <div>AI is thinking...</div>}
    </div>
  );
}

Best Practices

Error Handling

try {
  // Handle stream
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Stream aborted');
  } else {
    console.error('Stream error:', error);
  }
}

Cancel Stream

const controller = new AbortController();

// Start stream
fetch(url, {
  signal: controller.signal,
  // ...other options
});

// Cancel stream
controller.abort();

Buffer Processing

For high-frequency updates, consider using buffers to batch UI updates and avoid performance issues.

Important Notes

Important Reminders

• Streaming response does not support n > 1 parameter
• Some proxies or firewalls may not support SSE
• Token calculation for streaming response is returned at stream end
• Ensure proper handling of connection interruptions and reconnections

Related Documentation

Chat Completion API

Basic conversation API documentation

Streaming Response Handling Guide

In-depth practical tutorial

Performance Optimization

Improve response speed