Text-to-Speech (TTS) API Tutorial

High-quality speech synthesis with multilingual, multi-voice support for natural output

6 voice options

Natural human voices

Adjustable speed

0.25x–4.0x

Streaming playback

Real-time audio output

HD quality

High-fidelity output

1. Basic speech synthesis

Getting Started

import openai
from pathlib import Path

client = openai.OpenAI(
    api_key="your-api-key",
    base_url="https://api.n1n.ai/v1"
)

# Basic text-to-speech
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",  # Options: alloy, echo, fable, onyx, nova, shimmer
    input="Welcome to N1N's text-to-speech API, supporting multiple voices and languages."
)

# Save audio file
response.stream_to_file("output.mp3")

# High-quality version
response_hd = client.audio.speech.create(
    model="tts-1-hd",  # Higher quality with slightly higher latency
    voice="nova",
    speed=1.0,  # Speech rate 0.25-4.0
    input="This is a high-quality speech synthesis demo."
)

response_hd.stream_to_file("output_hd.mp3")

2. Voice selection guide

Alloy

Neutral, balanced voice

Best for: General scenarios, news reporting

Echo

Male, deep and resonant

Best for: Serious content, educational videos

Fable

British accent, elegant

Best for: Audiobooks, story narration

Onyx

Male, deep and magnetic

Best for: Podcasts, documentaries

Nova

Female, clear and friendly

Best for: Customer service, navigation systems

Shimmer

Female, warm and friendly

Best for: Children's content, assistants

3. Streaming audio playback

Real-time playback

import openai
import pygame
import io

# Stream and play audio
def stream_and_play_audio(text: str, voice: str = "alloy"):
    response = client.audio.speech.create(
        model="tts-1",
        voice=voice,
        input=text,
        response_format="mp3"
    )
    
    # Initialize pygame
    pygame.mixer.init()
    
    # Convert response stream to bytes
    audio_stream = io.BytesIO(response.content)
    
    # Load and play
    pygame.mixer.music.load(audio_stream)
    pygame.mixer.music.play()
    
    # Wait for playback to finish
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)

# Real-time voice assistant
class VoiceAssistant:
    def __init__(self):
        self.voice = "nova"
        self.speed = 1.0
        
    async def speak(self, text: str):
        """Asynchronous speech output"""
        response = await client.audio.speech.create(
            model="tts-1",
            voice=self.voice,
            speed=self.speed,
            input=text
        )
        
        # Stream playback
        await self.play_audio_stream(response)
    
    async def play_audio_stream(self, audio_data):
        # Implement streaming audio playback here
        pass

# Node.js streaming example
const stream = await openai.audio.speech.create({
  model: "tts-1",
  voice: "alloy",
  input: text,
  stream: true
});

// Pipe to an audio player
stream.pipe(audioPlayer);

4. Multilingual support

Internationalization use cases

# Multilingual support
languages = {
    "english": "Hello, this is English text to speech.",
    "chinese": "你好, 这是中文语音合成。",
    "japanese": "こんにちは、これは日本語の音声合成です。",
    "korean": "안녕하세요, 이것은 한국어 음성 합성입니다.",
    "spanish": "Hola, esta es la síntesis de voz en español.",
    "french": "Bonjour, ceci est la synthèse vocale française.",
    "german": "Hallo, das ist deutsche Sprachsynthese.",
    "russian": "Привет, это русский синтез речи."
}

# Batch-generate multilingual audio
for lang, text in languages.items():
    response = client.audio.speech.create(
        model="tts-1-hd",
        voice="nova",  # Nova voice has good multilingual support
        input=text
    )
    response.stream_to_file(f"output_{lang}.mp3")
    print(f"Generated {lang} audio")

# SSML support (advanced feature)
ssml_text = """
<speak>
    <prosody rate="slow">Speak this part more slowly.</prosody>
    <break time="500ms"/>
    <prosody pitch="+2st">Make this part higher pitch.</prosody>
    <emphasis level="strong">This is important!</emphasis>
</speak>
"""

response = client.audio.speech.create(
    model="tts-1-hd",
    voice="alloy",
    input=ssml_text,
    response_format="mp3"
)

5. Use cases

📚 Content creation

  • ✅ Audiobooks
  • ✅ Podcast voiceovers
  • ✅ Video narration
  • ✅ Course explanations

🤖 Intelligent interactions

  • ✅ Voice assistants
  • ✅ Customer support
  • ✅ Navigation announcements
  • ✅ Accessibility applications