Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tera.gw/llms.txt

Use this file to discover all available pages before exploring further.

Set "stream": true to receive tokens incrementally. Tera streams responses as Server-Sent Events on the same /v1/chat/completions endpoint.

Wire format

Each event is a single line prefixed with data: carrying a JSON delta. The stream terminates with data: [DONE].
data: {"id":"...","object":"chat.completion.chunk","created":1700000000,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","created":1700000000,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","created":1700000000,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","created":1700000000,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Reading the stream

from openai import OpenAI

client = OpenAI(base_url="https://api.tera.gw/v1", api_key="sk-tera-...")

stream = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    messages=[{"role": "user", "content": "Count to five."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Finish reasons

The final non-[DONE] event has a non-null finish_reason:
  • stop — natural end of generation
  • length — hit max_tokens or the model’s max context
  • tool_calls — the model emitted a tool call (see Tool calling)

Streaming with reasoning models

Reasoning models (e.g. Qwen/Qwen3.5-27B) stream reasoning_content deltas before the visible content. See Reasoning.

Operational notes

  • Heartbeats — we do not currently send keep-alive comments. Configure client read timeouts above your expected longest generation (server-side default: 120s).
  • Disconnects — if the client disconnects mid-stream, generation is cancelled on the backend.
  • HTTP/2 — Tera supports HTTP/2; SDK defaults are fine.