LLM API is a professional AI interface service platform that provides unified API interfaces to call mainstream language models like GPT, Claude, and Llama. Enterprise-grade API service helping developers quickly integrate AI capabilities.

How to get started with LLM API?

After registration, you will receive API keys. Use our SDKs or call RESTful APIs directly to complete LLM API integration in 5 minutes. Supports Python, Node.js, PHP and other languages.

Which AI models does LLM API support?

Our LLM API supports GPT-4o, GPT-4, Claude 3 Opus/Sonnet/Haiku, Llama 3, Mistral and other mainstream language models through unified API interface.

How does LLM API charge?

LLM API uses flexible pay-as-you-go pricing with free credits for trial. Professional plan at $1 per credit supports 500K calls/month. Enterprise plan offers custom solutions for large-scale API needs.

What is the difference between API services?

LLM API (Large Language Model API) is a unified interface service for language models. We provide standardized API interfaces for all mainstream AI models including GPT, Claude, and Llama.

Chat Completion API

Core interface for creating conversational AI applications, supporting multi-turn conversations, system prompts, function calling, and other advanced features

POSTStable VersionStreaming Support

Endpoint

POSThttps://api.n1n.ai/v1/chat/completions

Request Headers

Header	Value	Description
Content-Type	application/json	Request body format
Authorization	Bearer YOUR_API_KEY	Authentication token

Request Parameters

Parameter	Type	Required	Description
model	string		Model ID, e.g. `gpt-3.5-turbo`, `gpt-4`
messages	array		Array of chat messages containing conversation history
temperature	number	No	Sampling temperature, range 0-2, default 1. Higher values make output more random
max_tokens	integer	No	Maximum number of tokens to generate
stream	boolean	No	Enable streaming response, default false
top_p	number	No	Nucleus sampling parameter, range 0-1, default 1
n	integer	No	Number of responses to generate, default 1
stop	array	No	Stop sequences to halt generation, up to 4
presence_penalty	number	No	Presence penalty, range -2 to 2, default 0
frequency_penalty	number	No	Frequency penalty, range -2 to 2, default 0

Message Format

Each message object in the messages array includes the following fields:

Field	Type	Description
role	string	Role type: `system`,`user`,`assistant`
content	string	Message content
name	string	(Optional) Sender name

Message Array Example

[
  {
    "role": "system",
    "content": "You are a helpful AI assistant, skilled at answering various questions."
  },
  {
    "role": "user",
    "content": "What isMachine Learning? "
  },
  {
    "role": "assistant",
    "content": "Machine Learning is a branch of Artificial Intelligence..."
  },
  {
    "role": "user",
    "content": "Can you give me a simple example?"
  }
]

Request Example

Basic Request

cURL

curl https://api.n1n.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "Hello, how's the weather today?"
      }
    ],
    "temperature": 0.7
  }'

Streaming Response Request

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.n1n.ai/v1"
)

# Streaming response
stream = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Write a poem about spring"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Response Format

Success Response

{
  "id": "chatcmpl-123456",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "As an AI assistant, I cannot access real-time weather information. I suggest you check weather forecast websites or apps..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 28,
    "total_tokens": 41
  }
}

Streaming Response Format

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"As"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-3.5-turbo","choices":[{"index":0,"delta":{"content":"AI"},"finish_reason":null}]}

data: [DONE]

Error Handling

401 Unauthorized

Invalid or missing API key

{"error": {"message": "Invalid API key", "type": "invalid_request_error", "code": "invalid_api_key"}}

429 Too Many Requests

Request frequency exceeds limit

{"error": {"message": "Rate limit exceeded", "type": "rate_limit_error", "code": "rate_limit_exceeded"}}

400 Bad Request

Request parameters error

{"error": {"message": "Invalid model specified", "type": "invalid_request_error", "code": "model_not_found"}}

Best Practices

Recommended Practices

Set temperature parameter appropriately: use 0.7-1.0 for creative tasks, 0-0.3 for precise tasks
Use system messages to define AI's role and behavior guidelines
Preserve conversation history to maintain context continuity
Use max_tokens to limit response length and control costs
Use streaming responses for long conversations to improve user experience
Implement error retry mechanism to improve service stability