Documentation
Chernion API
OpenAI- and Anthropic-compatible inference at a fraction of list price. If your tool already talks to either API, it already talks to Chernion.
Quickstart
Chernion speaks the OpenAI and Anthropic APIs. Point any tool that already talks to either one at Chernion, drop in your key, and you're running GPT, Claude, and Gemini at a fraction of list price. Nothing else in your code changes.
- · Base URL (the gateway):
https://api.chernion.ai. OpenAI-style routes live under/v1; the Anthropic route is/v1/messages. - · Get a key under Dashboard → API Keys. Keys look like
sk-chrn-… - · Every billed call returns its exact cost, in the
X-Chernion-Cost-Micro-Usdheader and achernionblock on the body. - · All amounts are integer micro-USD (1 USD = 1,000,000).
curl https://api.chernion.ai/v1/chat/completions \
-H "Authorization: Bearer sk-chrn-..." \
-H "Content-Type: application/json" \
-d '{
"model": "sonnet-4.6",
"messages": [{"role": "user", "content": "Hello"}]
}'Authentication
Send your key on every key-protected request, either way (both work everywhere):
Authorization: Bearer sk-chrn-... # OpenAI-style
x-api-key: sk-chrn-... # Anthropic-styleKeep keys on a server, not in the browser. Listing models (GET /v1/models) needs no key; everything that runs a model, hosts a file, or reads your balance does.
Cost & balance
Every billed response carries the cost two ways: the X-Chernion-Cost-Micro-Usd response header, and a chernion object on the JSON body:
"chernion": {
"cost_micro_usd": 42,
"cost_usd": "0.000042",
"balance_after_micro_usd": 24826976
}Models
https://api.chernion.ai/v1/modelsThe catalog in the standard OpenAI list shape, with live pricing and a chernion extension per model. No key required. Read capabilities from here rather than hardcoding: ids, pricing, and feature flags change.
Per-model chernion extension
- ·
effort→{ "levels": [...], "default": "high" }on reasoning models (Claude), ornullwhen the model has no effort knob. See Effort. - ·
vision→ boolean. Whether raw images can be sent to this model. See Attachments. - ·
input_per_mtok/output_per_mtok: what you pay per million tokens. - ·
official_input_per_mtok/official_output_per_mtok: provider list price; the discount fields are the spread.
Response (one entry)
{
"object": "list",
"data": [
{
"id": "opus-4.8",
"object": "model",
"chernion": {
"effort": { "levels": ["low", "medium", "high", "xhigh"], "default": "high" },
"vision": true,
"input_per_mtok": 9000000,
"output_per_mtok": 45000000,
"official_input_per_mtok": 15000000,
"official_output_per_mtok": 75000000,
"discount_input": "0.40",
"discount_output": "0.40"
}
}
]
}Available ids today:
fable-5opus-4.8opus-4.6gpt-5.5gpt-5.4gpt-5.4-minisonnet-4.6haiku-4.5gemini-3.1-progemini-3-flashgemini-2.5-flashgemini-3.1-flash-imagecodex-auto-reviewgpt-5.3-codex-sparkChat API
Two standard drop-in surfaces for the OpenAI and Anthropic SDKs and tools (Cursor, Claude Code, …), plus two Chernion convenience endpoints (plain prompt in, text out). The standard surfaces are where effort and attachments apply.
| Endpoint | Shape | Effort | Files |
|---|---|---|---|
| POST /v1/chat/completions | OpenAI Chat Completions | ✓ | ✓ |
| POST /v1/messages | Anthropic Messages | ✓ | ✓ |
| POST /v1/messages/count_tokens | Anthropic token count | · | · |
| POST /v1/chat/runs | Server-side resumable run | ✓ | ✓ |
| POST /chat | Simple chat | · | · |
| POST /code | Coding | · | · |
Model ids: all surfaces accept Chernion slugs (opus-4.8, fable-5, …). /v1/messages additionally accepts Anthropic-style names (claude-opus-4-…), which map to the catalog.
OpenAI · POST /v1/chat/completions
https://api.chernion.ai/v1/chat/completionsStandard OpenAI request: temperature, tools, etc. pass through. The response is a normal OpenAI completion plus the chernion cost block. Supports effort and attachments.
{
"model": "sonnet-4.6",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 1024
}{
"id": "chatcmpl-016mdGyS7VEQ",
"model": "sonnet-4.6",
"object": "chat.completion",
"created": 1781272612,
"choices": [
{"index": 0, "message": {"role": "assistant", "content": "hi"}, "finish_reason": "stop"}
],
"usage": {"prompt_tokens": 27, "completion_tokens": 4, "total_tokens": 31},
"chernion": {"cost_micro_usd": 42, "cost_usd": "0.000042", "balance_after_micro_usd": 24826976}
}Streaming: set "stream": true for text/event-stream chunks ending in data: [DONE]. Add "stream_options": {"include_usage": true} to get a final chunk carrying usage and the chernion cost.
Anthropic · POST /v1/messages
https://api.chernion.ai/v1/messagesThe Anthropic Messages API, Claude Code compatible. Returns a standard Anthropic message plus the chernion block. Set "stream": true for SSE. Supports effort and attachments.
curl https://api.chernion.ai/v1/messages \
-H "x-api-key: sk-chrn-..." \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "opus-4.8",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
}'Anthropic · POST /v1/messages/count_tokens
https://api.chernion.ai/v1/messages/count_tokensAnthropic token-count: same body as /v1/messages, returns {"input_tokens": N} without running the model. Not billed.
Chernion · POST /chat
https://api.chernion.ai/chatSimple chat, a plain prompt in, text out. Pass message for a one-shot, or messages for a turn list.
// request
{ "model": "sonnet-4.6", "message": "Hello", "system": "Be terse.", "max_tokens": 256 }
// response
{ "model": "sonnet-4.6", "reply": "hi", "usage": {...}, "chernion": {...} }Chat runs (server-side, resumable)
Server-side generation that survives a page refresh or a dropped connection. The run keeps going on the gateway and you reconnect to it. This is what powers the website chat. Supports effort and attachments.
Start · POST /v1/chat/runs
https://api.chernion.ai/v1/chat/runs// request
{ "model": "opus-4.8", "messages": [...], "max_tokens": 4000, "effort": "high" }
// response
{ "run_id": "run_4hT2...", "status": "running" }Snapshot · GET /v1/chat/runs/{run_id}
https://api.chernion.ai/v1/chat/runs/{run_id}Point-in-time state: { "status", "content", "usage", "cost" }. status is one of running, complete, cancelled, error.
Stream · GET /v1/chat/runs/{run_id}/stream
https://api.chernion.ai/v1/chat/runs/{run_id}/streamReconnectable SSE: replays the content generated so far, then streams the rest to completion. Safe to call again after a disconnect.
Cancel · POST /v1/chat/runs/{run_id}/cancel
https://api.chernion.ai/v1/chat/runs/{run_id}/cancelStops generation. Output produced before the cancel is billed.
Coding API
https://api.chernion.ai/codeA coding-focused convenience endpoint: a prompt in, code out. code is the extracted source; raw is the full model reply (prose + fences) if you want it.
// request
{ "model": "opus-4.8", "prompt": "binary search over a sorted int array", "language": "python", "max_tokens": 512 }
// response
{
"model": "opus-4.8",
"language": "python",
"code": "def bsearch(xs, t): ...",
"raw": "Here's a clean implementation: ...",
"usage": {...},
"chernion": {...}
}Effort (reasoning depth)
effort controls how hard the model thinks (and how much it spends) per request. Higher effort means smarter answers at the cost of more latency and tokens. It applies to Claude models only; sending it to GPT or Gemini returns an error.
Send it as a top-level string on /v1/chat/completions, /v1/messages, and /v1/chat/runs. Read the valid levels per model from GET /v1/models → chernion.effort.levels (ordered Faster → Smarter); don't hardcode them, as they can change. A model with "effort": null has no knob. The default is high when you omit the field.
OpenAI-compatible · POST /v1/chat/completions
curl https://api.chernion.ai/v1/chat/completions \
-H "Authorization: Bearer $CHERNION_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "opus-4.8",
"messages": [{"role":"user","content":"Refactor this module."}],
"effort": "xhigh"
}'Anthropic-native · POST /v1/messages
curl https://api.chernion.ai/v1/messages \
-H "x-api-key: $CHERNION_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "opus-4.8",
"max_tokens": 4000,
"messages": [{"role":"user","content":"Refactor this module."}],
"effort": "high"
}'Aliases: on Claude models, reasoning_effort is accepted as an alias for effort; on OpenAI models reasoning_effort passes through natively.
Errors: sending effort to a non-Claude model, or an out-of-range level, returns 400 with code unsupported_parameter.
Attachments (images & files)
Send images and files alongside text on /v1/chat/completions, /v1/messages, and /v1/chat/runs. Files work on every model (extracted to text server-side); images need chernion.vision: true on the model (see Models).
OpenAI shape · content parts
"content": [
{ "type": "text", "text": "What's in these?" },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } },
{ "type": "input_file", "filename": "report.pdf", "mime_type": "application/pdf", "file_data": "<base64>" }
]Anthropic shape · content blocks
"content": [
{ "type": "text", "text": "What's in these?" },
{ "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": "<base64>" } },
{ "type": "input_file", "filename": "report.pdf", "mime_type": "application/pdf", "file_data": "<base64>" }
]By reference · large files
Upload once to POST /v1/files, then reference the returned id instead of inlining base64 (handy for big or reused attachments):
{ "type": "input_file", "file_id": "file_..." }
{ "type": "input_image", "file_id": "file_..." }Limits & types
- · Up to 8 images per request, 10 MiB per attachment.
- · Supported files: text / code / config / data,
.pdf,.docx,.xlsx. Other types fall back to a "couldn't read" note. - · Errors:
400 invalid_attachment(malformed, or image on a non-vision model),413oversized.
Files (temp hosting)
Short-lived, link-shareable file hosting with a 7-day expiry. It serves double duty: plain download hosting, and the upload source for attachment file_id references.
Upload · POST /v1/files
https://api.chernion.ai/v1/filesMultipart form upload (field file). Returns the record and a download URL.
curl https://api.chernion.ai/v1/files \
-H "Authorization: Bearer sk-chrn-..." \
-F "[email protected]"
# => {
# "id": "file_...", "filename": "report.pdf", "size_bytes": 184320,
# "content_type": "application/pdf", "download_url": "https://api.chernion.ai/v1/files/<token>",
# "download_path": "/v1/files/<token>", "expires_at": "2026-06-23T12:00:00Z"
# }Download · GET /v1/files/{token}
https://api.chernion.ai/v1/files/{token}Serves the file as an attachment. 410 once expired or missing.
Metadata · GET /v1/files/{token}/info
https://api.chernion.ai/v1/files/{token}/infoReturns the record without the bytes, or 410 if gone.
Delete · DELETE /v1/files/{id}
DELETE https://api.chernion.ai/v1/files/{id}Owner-only delete by id.
Claude Code
Claude Code talks to the Anthropic endpoint, so point it at Chernion's:
export ANTHROPIC_BASE_URL=https://api.chernion.ai
export ANTHROPIC_AUTH_TOKEN=sk-chrn-...
export ANTHROPIC_MODEL=opus-4.8
# optional, for its background tasks:
export ANTHROPIC_SMALL_FAST_MODEL=haiku-4.5
claude(Windows: setx ANTHROPIC_BASE_URL https://api.chernion.ai, and so on.) Claude Code now runs on Chernion, billed from your balance.
CLI
The chernion command-line agent (version 0.1.1) runs on the same gateway: chat, generate code, read and edit files in your project, run commands, and reach the internet without leaving your shell. Same model ids, the same sk-chrn-… key, and the same per-call cost printed after every run. It also reads skills, subagents, and rules from a .chernion/ folder.
Install
npm install -g @chernion/cli@latest # the `chernion` command (alias: chrn)
chernion --version # chernion 0.1.1
# or run it once, no install:
npx @chernion/cli@latest chat "Hello"Authenticate
Drop your key in the environment, or run chernion login to store it. Grab one under Dashboard → API Keys.
export CHERNION_API_KEY=sk-chrn-... # picked up automatically
# or save it (to ~/.config/chernion, Windows: %APPDATA%\chernion):
chernion login
# point at a different gateway:
export CHERNION_BASE_URL=https://api.chernion.ai(Windows: setx CHERNION_API_KEY sk-chrn-....)
chernion chat
A prompt in, an answer out, streamed by default. Pass text for a one-shot, or run it bare for an interactive session; pipe a file in and it's appended to your prompt. Supports effort and attachments.
chernion chat "Explain this stack trace" -m opus-4.8 -e high
cat error.log | chernion chat "what blew up?"
chernion chat -a diagram.png "what does this describe?"
chernion chat # interactive: /model /effort /system /clear /exitEvery run ends with the exact cost, read straight off the response, e.g. · opus-4.8 · 1,204 tok · $0.001083.
chernion code
The coding endpoint from your shell: a prompt in, source out. Prints just the extracted code; --raw keeps the prose, -o writes a file.
chernion code "binary search over a sorted int array" -l python
chernion code "a debounce hook" -l ts -o useDebounce.tschernion models
The live catalog: ids, vision, effort levels, and price per million tokens. No key required; read ids from here instead of hardcoding.
chernion models
# id vision effort $in/Mtok $out/Mtok
# opus-4.8 yes low·medium·high·xhigh 9.00 45.00
# sonnet-4.6 yes - ...
chernion models --json # raw GET /v1/modelschernion files
Upload, list, and fetch the 7-day temp files. An upload prints an id you can hand straight to chat as an attachment, plus a shareable URL.
chernion files upload report.pdf
# file_... report.pdf 180 KB expires in 7 days
chernion chat -a file_... "summarize the findings"Flags & config
- ·
-m, --model: any catalog slug (defaultsonnet-4.6, or your saved default). - ·
-e, --effort(Claude only): low · medium · high · xhigh (see Effort). - ·
-a, --attach <path|file_id>: repeatable; images & files (see Attachments). 8 images, 10 MiB each. - ·
-s, --system <text>: system prompt.--max-tokens <n>,--no-stream,--json. - ·
chernion config set model opus-4.8saves a default;chernion balanceshows what's left.
Agent tools
Run bare, chernion is a coding agent that reads, edits, and runs commands in your project, gated by the permission mode you set (--ask, --auto-edit, --plan, --yolo). The read tools run freely; the edit and shell tools go through the gate.
| Tool | Class | What it does |
|---|---|---|
| read_file | read | Read a file with line numbers. |
| list_dir | read | List a directory (respects .gitignore). |
| glob | read | Find files by pattern, e.g. src/**/*.ts. |
| grep | read | Regex search across files. |
| fetch_url | read | HTTP/HTTPS request from your machine; returns the response as text. |
| write_file | edit | Create or overwrite a file (gated). |
| edit_file | edit | Exact-match replace with a diff preview (gated). |
| download_file | edit | Download a URL into the workspace (gated). |
| run_bash | shell | Run a command in the workspace (gated). |
Internet access (new in 0.1.1): fetch_url makes an HTTP or HTTPS request from your machine and returns the body as text, for reading web pages or calling JSON / REST APIs (it caps long text responses and flags binary content). download_file saves a URL straight into the workspace (images, fonts, audio/video, archives); the saved path is confined to the workspace, the same as the file tools.
Slash commands (REPL)
Type these inside the interactive session.
| Command | What it does |
|---|---|
| /model | Switch model. No argument opens a picker. |
| /effort | Set reasoning effort. No argument opens a picker. |
| /mode | Open the permission mode picker. |
| /ask | Switch to ask mode (prompt before writes and shell). |
| /auto-edit | Apply edits without asking, still prompt for shell. |
| /plan | Plan mode: read only, explore and propose. |
| /accept | Exit plan mode and apply (moves to auto-edit). |
| /yolo | Yolo mode: no prompts. |
| /system | Set a session role. /system clear resets it. |
| /skills | List the skills discovered under .chernion/. |
| /agents | List the subagents discovered under .chernion/. |
| /improve | Toggle auto prompt improvement (/improve on|off). |
| /tokens | Show session token usage. Aliases: /usage, /cost. |
| /clear | Clear chat history and memory (fresh start). Alias: /reset. |
| /compact | Summarize history to save context. |
| /login | How to update your key. |
| /help | Command list. Alias: /?. |
| /exit | Quit. Alias: /quit. |
/clear (alias /reset): starts a completely fresh session. It wipes the model's conversation history and the running token/cost totals, and it clears the terminal screen and scrollback so the previous conversation is gone from view too. Use it when you want the agent to forget everything and start clean. This affects the interactive session only; it does not touch the server-backed memory used by the separate chernion chat command.
/compact: summarizes the current conversation to save context, the same idea as Claude's /compact. It makes one call to the model to condense the session into terse notes (decisions made, files touched, open tasks), then continues from that summary so the agent keeps its bearings while using far fewer tokens. Your visible scrollback is left intact for reference.
Auto prompt improvement
Before each turn, chernion refines your prompt into a clearer, self-contained instruction without changing its intent, then shows you the refined version. It is on by default and skips trivial inputs (greetings, very short messages, slash commands).
Turn it off for a session with /improve off (and back on with /improve on), or persist the choice by setting "improvePrompt": false in ~/.chernion/config.json.
Same error shapes as the API (see Errors & limits): a bad key is 401 invalid_api_key, an empty balance 402 insufficient_balance. The CLI exits non-zero and prints the code.
Skills, subagents, and rules
The CLI discovers skills, subagents, and rules from a .chernion/ folder, mirroring Claude Code's .claude/ layout. Two scopes are merged: the project scope (<repo>/.chernion/) overrides the user scope (~/.chernion/) when a name clashes.
.chernion/
rules/*.md # always-on house rules (root CHERNION.md and AGENTS.md still work too)
skills/<name>/SKILL.md # a reusable instruction pack
agents/<name>.md # a specialist subagentSkills
Skills use progressive disclosure: only each skill's name and description sit in the prompt, and the agent calls the use_skill tool to load the full body when a request matches. A skill is a folder with a SKILL.md file.
Subagents
A subagent is a Markdown file with frontmatter (name, description, and optional tools and model) and a body that becomes its system prompt. The agent delegates a self-contained subtask to it through the run_agent tool. Subagents share your permission mode and cannot spawn further subagents.
Frontmatter is a small YAML subset, for example:
---
name: landing-page
description: build a polished marketing landing page
---
1. Confirm the brand and goal.
2. Build a responsive hero.
3. Ship to ./site.Discovery & commands
use_skill and run_agent are only offered to the model when the workspace actually defines skills or agents, so a bare workspace carries zero overhead. List what was found with the /skills and /agents slash commands.
SDKs & Tools
OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="https://api.chernion.ai/v1", api_key="sk-chrn-...")
client.chat.completions.create(model="sonnet-4.6", messages=[{"role": "user", "content": "Hi"}])Anthropic SDK
import anthropic
client = anthropic.Anthropic(base_url="https://api.chernion.ai", api_key="sk-chrn-...")
client.messages.create(model="opus-4.8", max_tokens=1024, messages=[{"role": "user", "content": "Hi"}])Cursor
Settings → Models → Override OpenAI Base URL https://api.chernion.ai/v1, paste your key, add a model id (e.g. sonnet-4.6).
Continue (~/.continue/config.json)
{
"models": [
{
"title": "Chernion · Sonnet 4.6",
"provider": "openai",
"model": "sonnet-4.6",
"apiBase": "https://api.chernion.ai/v1",
"apiKey": "sk-chrn-..."
}
]
}LangChain
from langchain_openai import ChatOpenAI
ChatOpenAI(base_url="https://api.chernion.ai/v1", api_key="sk-chrn-...", model="sonnet-4.6")Errors & limits
Check the HTTP status, then read the body. OpenAI routes return {"error":{"message","type","code"}}; the Anthropic route returns {"type":"error","error":{...}}.
| Status | code | Meaning |
|---|---|---|
| 400 | unsupported_parameter | bad effort level, or effort on a non-Claude model |
| 400 | invalid_attachment | malformed image/file part, or image on a non-vision model |
| 400 | invalid_messages | messages array missing or malformed |
| 401 | invalid_api_key | missing / invalid / revoked key |
| 402 | insufficient_balance | top up to keep going |
| 404 | model_not_found | unknown model id |
| 413 | oversized | attachment over 10 MiB |
| 429 | rate_limited | back off; honor Retry-After |
| 502 / 503 | upstream_failed | provider hiccup; retry |
Limits: 8 images per request, 10 MiB per attachment. All amounts are integer micro-USD (1 USD = 1,000,000). On 429, honor the Retry-After header.