Streaming Response API
Use Server-Sent Events to implement real-time streaming output, allowing users to see generated content immediately
SSEReal-time ResponseAdvanced Feature
What is Streaming Response?
Streaming response allows models to return results in real-time while generating content, rather than waiting for the complete response before returning. This greatly improves user experience, especially when generating long texts.
❌ Traditional Method
- • Wait for complete response (may take several seconds)
- • User experience delay
- • Timeout risk
✅ Streaming Response
- • Start displaying content immediately
- • Real-time UI updates
- • Better user experience
Enable Streaming Response
Set stream: true in the request to enable streaming response.
Request Example
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Write an article about Artificial Intelligence"
}
],
"stream": true // Enable streaming response
}Streaming Response Format
Streaming response uses Server-Sent Events (SSE) format, with each data chunk starting with data:.
SSE Data Format
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"Artificial"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":" Intelligence"},"finish_reason":null}]}
data: [DONE]Important Field Descriptions
- •
delta: contains incremental content - •
finish_reason: completion reason (stop/length/function_call) - •
[DONE]: indicates stream end
ImplementExample
PythonImplement
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.n1n.ai/v1"
)
# Create streaming response
stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
# Handle streaming response
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)JavaScriptImplement
async function streamChat(message) {
const response = await fetch('https://api.n1n.ai/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: message }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
console.log('Stream finished');
return;
}
try {
const parsed = JSON.parse(data);
const content = parsed.choices[0].delta.content;
if (content) {
process.stdout.write(content);
}
} catch (e) {
console.error('Error parsing:', e);
}
}
}
}
}ReactComponentExample
import { useState } from 'react';
function ChatComponent() {
const [messages, setMessages] = useState([]);
const [isStreaming, setIsStreaming] = useState(false);
async function sendMessage(content) {
setIsStreaming(true);
const newMessage = { role: 'assistant', content: '' };
setMessages(prev => [...prev, { role: 'user', content }, newMessage]);
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: content, stream: true })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse and update messages
setMessages(prev => {
const msgs = [...prev];
msgs[msgs.length - 1].content += chunk;
return msgs;
});
}
setIsStreaming(false);
}
return (
<div>
{messages.map((msg, i) => (
<div key={i} className={msg.role}>
{msg.content}
</div>
))}
{isStreaming && <div>AI is thinking...</div>}
</div>
);
}Best Practices
Error Handling
try {
// Handle stream
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream aborted');
} else {
console.error('Stream error:', error);
}
}Cancel Stream
const controller = new AbortController();
// Start stream
fetch(url, {
signal: controller.signal,
// ...other options
});
// Cancel stream
controller.abort();Buffer Processing
For high-frequency updates, consider using buffers to batch UI updates and avoid performance issues.
Important Notes
Important Reminders
- • Streaming response does not support
n > 1parameter - • Some proxies or firewalls may not support SSE
- • Token calculation for streaming response is returned at stream end
- • Ensure proper handling of connection interruptions and reconnections