API Reference¶
Squish exposes an OpenAI-compatible REST API on http://localhost:11435 by default.
Authentication¶
By default the server accepts requests without an API key.
To require a key, set the environment variable before starting the server:
Pass the key using the standard Authorization: Bearer <key> header.
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
SQUISH_API_KEY |
(none) | When set, all API requests must supply Authorization: Bearer <key> |
HF_TOKEN |
(none) | HuggingFace access token — required for gated models |
SQUISH_OFFLINE |
0 |
Set to 1 to disable all network access (model must already be cached) |
SQUISH_CACHE_DIR |
~/.squish/models |
Override the default model cache directory |
Endpoints¶
GET /v1/models¶
Lists all locally available models.
Response
{
"object": "list",
"data": [
{
"id": "llama3.1:8b",
"object": "model",
"created": 1720000000,
"owned_by": "squish"
}
]
}
POST /v1/chat/completions¶
OpenAI-compatible chat completion.
Request body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | ✅ | — | Model ID (e.g. llama3.1:8b) |
messages |
array | ✅ | — | Array of {"role": "...", "content": "..."} |
max_tokens |
integer | 512 | Maximum tokens to generate | |
temperature |
float | 0.7 | Sampling temperature | |
top_p |
float | 0.9 | Top-p nucleus sampling | |
stream |
boolean | false | Stream tokens via SSE | |
stop |
string/array | null | Stop sequence(s) |
Example
curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "What is MLX?"}
],
"max_tokens": 128,
"temperature": 0.5
}'
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1720000001,
"model": "llama3.1:8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "MLX is Apple's machine-learning framework optimised for Apple Silicon..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 42,
"total_tokens": 66
}
}
POST /v1/completions¶
Text completion (non-chat). Supports single prompts and batched requests.
Request body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | ✅ | — | Model ID |
prompt |
string | ✅* | — | Single prompt text |
batch |
array | ✅* | — | Array of prompt strings (mutually exclusive with prompt) |
max_tokens |
integer | 256 | Maximum tokens per completion | |
temperature |
float | 0.7 | Sampling temperature | |
top_p |
float | 0.9 | Top-p nucleus sampling |
*Either prompt or batch is required.
Single prompt example
curl http://localhost:11435/v1/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3.1:8b", "prompt": "Once upon a time"}'
Batch example
curl http://localhost:11435/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"batch": ["The sky is", "The ocean is", "The forest is"],
"max_tokens": 32
}'
GET /health¶
Liveness probe.
Streaming¶
Set "stream": true in a /v1/chat/completions request to receive tokens via Server-Sent Events (SSE), exactly like the OpenAI API:
curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"Count to 5"}],"stream":true}'
Each SSE event is a JSON delta. The stream ends with data: [DONE].
Error responses¶
| HTTP status | Meaning |
|---|---|
| 400 | Bad request — missing or invalid fields |
| 401 | Unauthorized — invalid or missing API key |
| 404 | Model not found — run squish pull <model> first |
| 429 | Too many requests — queue is full, back off and retry |
| 500 | Internal server error — check server logs |
CLI reference¶
| Command | Description |
|---|---|
squish pull <model> |
Download + compress a model |
squish run <model> |
Interactive chat REPL |
squish run <model> --prompt "..." |
Single-turn inference |
squish serve |
Start the API server |
squish serve --port N |
Custom port |
squish list |
List downloaded models |
squish rm <model> |
Delete a model |
squish search [query] |
Search the community hub |