curl --request POST \
--url https://api.tera.gw/v1/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{
"content": "<string>",
"name": "<string>",
"tool_call_id": "<string>",
"tool_calls": [
{
"id": "<string>",
"type": "function",
"function": {
"name": "<string>",
"arguments": "<string>"
}
}
]
}
],
"max_tokens": 256,
"temperature": 0.7,
"top_p": 0.5,
"top_k": 123,
"stop": "<string>",
"seed": 123,
"frequency_penalty": 0,
"presence_penalty": 0,
"repetition_penalty": 123,
"stream": false,
"tools": [
{
"type": "function"
}
],
"response_format": {}
}
'{
"id": "<string>",
"object": "chat.completion",
"created": 123,
"model": "<string>",
"choices": [
{
"index": 123,
"message": {
"role": "<string>",
"content": "<string>",
"reasoning": "<string>",
"reasoning_content": "<string>",
"tool_calls": [
{
"id": "<string>",
"type": "function",
"function": {
"name": "<string>",
"arguments": "<string>"
}
}
]
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 123,
"total_tokens": 123
}
}Chat completions
OpenAI-compatible chat completions endpoint. Set stream: true for
Server-Sent Events. Reasoning models return chain-of-thought traces in
a separate field — reasoning for models using the OpenAI gpt-oss
parser (e.g. openai/gpt-oss-20b), reasoning_content for models
using the qwen3 parser (e.g. Qwen/Qwen3.5-27B). Treat the two as
aliases. See Reasoning models.
curl --request POST \
--url https://api.tera.gw/v1/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{
"content": "<string>",
"name": "<string>",
"tool_call_id": "<string>",
"tool_calls": [
{
"id": "<string>",
"type": "function",
"function": {
"name": "<string>",
"arguments": "<string>"
}
}
]
}
],
"max_tokens": 256,
"temperature": 0.7,
"top_p": 0.5,
"top_k": 123,
"stop": "<string>",
"seed": 123,
"frequency_penalty": 0,
"presence_penalty": 0,
"repetition_penalty": 123,
"stream": false,
"tools": [
{
"type": "function"
}
],
"response_format": {}
}
'{
"id": "<string>",
"object": "chat.completion",
"created": 123,
"model": "<string>",
"choices": [
{
"index": 123,
"message": {
"role": "<string>",
"content": "<string>",
"reasoning": "<string>",
"reasoning_content": "<string>",
"tool_calls": [
{
"id": "<string>",
"type": "function",
"function": {
"name": "<string>",
"arguments": "<string>"
}
}
]
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 123,
"total_tokens": 123
}
}Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
HuggingFace model id. See /v1/models.
"Qwen/Qwen2.5-7B-Instruct"
Show child attributes
Show child attributes
Maximum tokens to generate.
256
0 <= x <= 20.7
0 <= x <= 1vLLM-specific. Top-k sampling.
Deterministic seed for sampling.
-2 <= x <= 2-2 <= x <= 2vLLM-specific. Penalty for repeated tokens.
Show child attributes
Show child attributes
none, auto, required Optional response constraints — e.g. {"type": "json_object"}
for JSON mode, or {"type": "json_schema", "json_schema": {...}}
for structured outputs.