Streaming Response API

Use Server-Sent Events to implement real-time streaming output, allowing users to see generated content immediately

SSEReal-time ResponseAdvanced Feature

What is Streaming Response?

Streaming response allows models to return results in real-time while generating content, rather than waiting for the complete response before returning. This greatly improves user experience, especially when generating long texts.

❌ Traditional Method

  • • Wait for complete response (may take several seconds)
  • • User experience delay
  • • Timeout risk

✅ Streaming Response

  • • Start displaying content immediately
  • • Real-time UI updates
  • • Better user experience

Enable Streaming Response

Set stream: true in the request to enable streaming response.

Request Example
{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Write an article about Artificial Intelligence"
    }
  ],
  "stream": true  // Enable streaming response
}

Streaming Response Format

Streaming response uses Server-Sent Events (SSE) format, with each data chunk starting with data:.

SSE Data Format
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"Artificial"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":" Intelligence"},"finish_reason":null}]}

data: [DONE]

Important Field Descriptions

  • delta: contains incremental content
  • finish_reason: completion reason (stop/length/function_call)
  • [DONE]: indicates stream end

ImplementExample

PythonImplement

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.n1n.ai/v1"
)

# Create streaming response
stream = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

# Handle streaming response
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

JavaScriptImplement

async function streamChat(message) {
  const response = await fetch('https://api.n1n.ai/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: JSON.stringify({
      model: 'gpt-3.5-turbo',
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') {
          console.log('Stream finished');
          return;
        }
        
        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices[0].delta.content;
          if (content) {
            process.stdout.write(content);
          }
        } catch (e) {
          console.error('Error parsing:', e);
        }
      }
    }
  }
}

ReactComponentExample

import { useState } from 'react';

function ChatComponent() {
  const [messages, setMessages] = useState([]);
  const [isStreaming, setIsStreaming] = useState(false);

  async function sendMessage(content) {
    setIsStreaming(true);
    const newMessage = { role: 'assistant', content: '' };
    setMessages(prev => [...prev, { role: 'user', content }, newMessage]);

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: content, stream: true })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      // Parse and update messages
      setMessages(prev => {
        const msgs = [...prev];
        msgs[msgs.length - 1].content += chunk;
        return msgs;
      });
    }
    
    setIsStreaming(false);
  }

  return (
    <div>
      {messages.map((msg, i) => (
        <div key={i} className={msg.role}>
          {msg.content}
        </div>
      ))}
      {isStreaming && <div>AI is thinking...</div>}
    </div>
  );
}

Best Practices

Error Handling

try { // Handle stream } catch (error) { if (error.name === 'AbortError') { console.log('Stream aborted'); } else { console.error('Stream error:', error); } }

Cancel Stream

const controller = new AbortController(); // Start stream fetch(url, { signal: controller.signal, // ...other options }); // Cancel stream controller.abort();

Buffer Processing

For high-frequency updates, consider using buffers to batch UI updates and avoid performance issues.

Important Notes

Important Reminders

  • • Streaming response does not support n > 1 parameter
  • • Some proxies or firewalls may not support SSE
  • • Token calculation for streaming response is returned at stream end
  • • Ensure proper handling of connection interruptions and reconnections