Documentation

Chernion API

OpenAI- and Anthropic-compatible inference at a fraction of list price. If your tool already talks to either API, it already talks to Chernion.

Quickstart

Chernion speaks the OpenAI and Anthropic APIs. Point any tool that already talks to either one at Chernion, drop in your key, and you're running GPT, Claude, and Gemini at a fraction of list price. Nothing else in your code changes.

· Base URL (the gateway): https://api.chernion.ai. OpenAI-style routes live under /v1; the Anthropic route is /v1/messages.
· Get a key under Dashboard → API Keys. Keys look like sk-chrn-…
· Every billed call returns its exact cost, in the X-Chernion-Cost-Micro-Usd header and a chernion block on the body.
· All amounts are integer micro-USD (1 USD = 1,000,000).

bash

curl https://api.chernion.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-chrn-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonnet-4.6",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Authentication

Send your key on every key-protected request, either way (both work everywhere):

http

Authorization: Bearer sk-chrn-...   # OpenAI-style
x-api-key: sk-chrn-...               # Anthropic-style

Keep keys on a server, not in the browser. Listing models (GET /v1/models) needs no key; everything that runs a model, hosts a file, or reads your balance does.

Cost & balance

Every billed response carries the cost two ways: the X-Chernion-Cost-Micro-Usd response header, and a chernion object on the JSON body:

json

"chernion": {
  "cost_micro_usd": 42,
  "cost_usd": "0.000042",
  "balance_after_micro_usd": 24826976
}

Models

GEThttps://api.chernion.ai/v1/models

The catalog in the standard OpenAI list shape, with live pricing and a chernion extension per model. No key required. Read capabilities from here rather than hardcoding: ids, pricing, and feature flags change.

Per-model chernion extension

· effort → { "levels": [...], "default": "high" } on reasoning models (Claude), or null when the model has no effort knob. See Effort.
· vision → boolean. Whether raw images can be sent to this model. See Attachments.
· input_per_mtok / output_per_mtok: what you pay per million tokens.
· official_input_per_mtok / official_output_per_mtok: provider list price; the discount fields are the spread.

Response (one entry)

json

{
  "object": "list",
  "data": [
    {
      "id": "opus-4.8",
      "object": "model",
      "chernion": {
        "effort": { "levels": ["low", "medium", "high", "xhigh"], "default": "high" },
        "vision": true,
        "input_per_mtok": 9000000,
        "output_per_mtok": 45000000,
        "official_input_per_mtok": 15000000,
        "official_output_per_mtok": 75000000,
        "discount_input": "0.40",
        "discount_output": "0.40"
      }
    }
  ]
}

Available ids today:

fable-5opus-4.8opus-4.6gpt-5.5gpt-5.4gpt-5.4-minisonnet-4.6haiku-4.5gemini-3.1-progemini-3-flashgemini-2.5-flashgemini-3.1-flash-imagecodex-auto-reviewgpt-5.3-codex-spark

Chat API

Two standard drop-in surfaces for the OpenAI and Anthropic SDKs and tools (Cursor, Claude Code, …), plus two Chernion convenience endpoints (plain prompt in, text out). The standard surfaces are where effort and attachments apply.

Endpoint	Shape	Effort	Files
POST /v1/chat/completions	OpenAI Chat Completions	✓	✓
POST /v1/messages	Anthropic Messages	✓	✓
POST /v1/messages/count_tokens	Anthropic token count	·	·
POST /v1/chat/runs	Server-side resumable run	✓	✓
POST /chat	Simple chat	·	·
POST /code	Coding	·	·

Model ids: all surfaces accept Chernion slugs (opus-4.8, fable-5, …). /v1/messages additionally accepts Anthropic-style names (claude-opus-4-…), which map to the catalog.

OpenAI · POST /v1/chat/completions

POSThttps://api.chernion.ai/v1/chat/completions

Standard OpenAI request: temperature, tools, etc. pass through. The response is a normal OpenAI completion plus the chernion cost block. Supports effort and attachments.

json

{
  "model": "sonnet-4.6",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 1024
}

json

{
  "id": "chatcmpl-016mdGyS7VEQ",
  "model": "sonnet-4.6",
  "object": "chat.completion",
  "created": 1781272612,
  "choices": [
    {"index": 0, "message": {"role": "assistant", "content": "hi"}, "finish_reason": "stop"}
  ],
  "usage": {"prompt_tokens": 27, "completion_tokens": 4, "total_tokens": 31},
  "chernion": {"cost_micro_usd": 42, "cost_usd": "0.000042", "balance_after_micro_usd": 24826976}
}

Streaming: set "stream": true for text/event-stream chunks ending in data: [DONE]. Add "stream_options": {"include_usage": true} to get a final chunk carrying usage and the chernion cost.

Anthropic · POST /v1/messages

POSThttps://api.chernion.ai/v1/messages

The Anthropic Messages API, Claude Code compatible. Returns a standard Anthropic message plus the chernion block. Set "stream": true for SSE. Supports effort and attachments.

bash

curl https://api.chernion.ai/v1/messages \
  -H "x-api-key: sk-chrn-..." \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "opus-4.8",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Anthropic · POST /v1/messages/count_tokens

POSThttps://api.chernion.ai/v1/messages/count_tokens

Anthropic token-count: same body as /v1/messages, returns {"input_tokens": N} without running the model. Not billed.

Chernion · POST /chat

POSThttps://api.chernion.ai/chat

Simple chat, a plain prompt in, text out. Pass message for a one-shot, or messages for a turn list.

json

// request
{ "model": "sonnet-4.6", "message": "Hello", "system": "Be terse.", "max_tokens": 256 }

// response
{ "model": "sonnet-4.6", "reply": "hi", "usage": {...}, "chernion": {...} }

Chat runs (server-side, resumable)

Server-side generation that survives a page refresh or a dropped connection. The run keeps going on the gateway and you reconnect to it. This is what powers the website chat. Supports effort and attachments.

Start · POST /v1/chat/runs

POSThttps://api.chernion.ai/v1/chat/runs

json

// request
{ "model": "opus-4.8", "messages": [...], "max_tokens": 4000, "effort": "high" }

// response
{ "run_id": "run_4hT2...", "status": "running" }

Snapshot · GET /v1/chat/runs/{run_id}

GEThttps://api.chernion.ai/v1/chat/runs/{run_id}

Point-in-time state: { "status", "content", "usage", "cost" }. status is one of running, complete, cancelled, error.

Stream · GET /v1/chat/runs/{run_id}/stream

GEThttps://api.chernion.ai/v1/chat/runs/{run_id}/stream

Reconnectable SSE: replays the content generated so far, then streams the rest to completion. Safe to call again after a disconnect.

Cancel · POST /v1/chat/runs/{run_id}/cancel

POSThttps://api.chernion.ai/v1/chat/runs/{run_id}/cancel

Stops generation. Output produced before the cancel is billed.

Coding API

POSThttps://api.chernion.ai/code

A coding-focused convenience endpoint: a prompt in, code out. code is the extracted source; raw is the full model reply (prose + fences) if you want it.

json

// request
{ "model": "opus-4.8", "prompt": "binary search over a sorted int array", "language": "python", "max_tokens": 512 }

// response
{
  "model": "opus-4.8",
  "language": "python",
  "code": "def bsearch(xs, t): ...",
  "raw": "Here's a clean implementation: ...",
  "usage": {...},
  "chernion": {...}
}

Effort (reasoning depth)

effort controls how hard the model thinks (and how much it spends) per request. Higher effort means smarter answers at the cost of more latency and tokens. It applies to Claude models only; sending it to GPT or Gemini returns an error.

Send it as a top-level string on /v1/chat/completions, /v1/messages, and /v1/chat/runs. Read the valid levels per model from GET /v1/models → chernion.effort.levels (ordered Faster → Smarter); don't hardcode them, as they can change. A model with "effort": null has no knob. The default is high when you omit the field.

OpenAI-compatible · POST /v1/chat/completions

bash

curl https://api.chernion.ai/v1/chat/completions \
  -H "Authorization: Bearer $CHERNION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "opus-4.8",
    "messages": [{"role":"user","content":"Refactor this module."}],
    "effort": "xhigh"
  }'

Anthropic-native · POST /v1/messages

bash

curl https://api.chernion.ai/v1/messages \
  -H "x-api-key: $CHERNION_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "opus-4.8",
    "max_tokens": 4000,
    "messages": [{"role":"user","content":"Refactor this module."}],
    "effort": "high"
  }'

Aliases: on Claude models, reasoning_effort is accepted as an alias for effort; on OpenAI models reasoning_effort passes through natively.

Errors: sending effort to a non-Claude model, or an out-of-range level, returns 400 with code unsupported_parameter.

Attachments (images & files)

Send images and files alongside text on /v1/chat/completions, /v1/messages, and /v1/chat/runs. Files work on every model (extracted to text server-side); images need chernion.vision: true on the model (see Models).

OpenAI shape · content parts

json

"content": [
  { "type": "text", "text": "What's in these?" },
  { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } },
  { "type": "input_file", "filename": "report.pdf", "mime_type": "application/pdf", "file_data": "<base64>" }
]

Anthropic shape · content blocks

json

"content": [
  { "type": "text", "text": "What's in these?" },
  { "type": "image", "source": { "type": "base64", "media_type": "image/png", "data": "<base64>" } },
  { "type": "input_file", "filename": "report.pdf", "mime_type": "application/pdf", "file_data": "<base64>" }
]

By reference · large files

Upload once to POST /v1/files, then reference the returned id instead of inlining base64 (handy for big or reused attachments):

json

{ "type": "input_file", "file_id": "file_..." }
{ "type": "input_image", "file_id": "file_..." }

Limits & types

· Up to 8 images per request, 10 MiB per attachment.
· Supported files: text / code / config / data, .pdf, .docx, .xlsx. Other types fall back to a "couldn't read" note.
· Errors: 400 invalid_attachment (malformed, or image on a non-vision model), 413 oversized.

Files (temp hosting)

Short-lived, link-shareable file hosting with a 7-day expiry. It serves double duty: plain download hosting, and the upload source for attachment file_id references.

Upload · POST /v1/files

POSThttps://api.chernion.ai/v1/files

Multipart form upload (field file). Returns the record and a download URL.

bash

curl https://api.chernion.ai/v1/files \
  -H "Authorization: Bearer sk-chrn-..." \
  -F "[email protected]"

# => {
#   "id": "file_...", "filename": "report.pdf", "size_bytes": 184320,
#   "content_type": "application/pdf", "download_url": "https://api.chernion.ai/v1/files/<token>",
#   "download_path": "/v1/files/<token>", "expires_at": "2026-06-23T12:00:00Z"
# }

Download · GET /v1/files/{token}

GEThttps://api.chernion.ai/v1/files/{token}

Serves the file as an attachment. 410 once expired or missing.

Metadata · GET /v1/files/{token}/info

GEThttps://api.chernion.ai/v1/files/{token}/info

Returns the record without the bytes, or 410 if gone.

Delete · DELETE /v1/files/{id}

POSTDELETE https://api.chernion.ai/v1/files/{id}

Owner-only delete by id.

Claude Code

Claude Code talks to the Anthropic endpoint, so point it at Chernion's:

bash

export ANTHROPIC_BASE_URL=https://api.chernion.ai
export ANTHROPIC_AUTH_TOKEN=sk-chrn-...
export ANTHROPIC_MODEL=opus-4.8
# optional, for its background tasks:
export ANTHROPIC_SMALL_FAST_MODEL=haiku-4.5
claude

(Windows: setx ANTHROPIC_BASE_URL https://api.chernion.ai, and so on.) Claude Code now runs on Chernion, billed from your balance.

CLI

The chernion command-line agent (version 0.1.1) runs on the same gateway: chat, generate code, read and edit files in your project, run commands, and reach the internet without leaving your shell. Same model ids, the same sk-chrn-… key, and the same per-call cost printed after every run. It also reads skills, subagents, and rules from a .chernion/ folder.

Install

bash

npm install -g @chernion/cli@latest   # the `chernion` command (alias: chrn)

chernion --version                    # chernion 0.1.1

# or run it once, no install:
npx @chernion/cli@latest chat "Hello"

Authenticate

Drop your key in the environment, or run chernion login to store it. Grab one under Dashboard → API Keys.

bash

export CHERNION_API_KEY=sk-chrn-...      # picked up automatically

# or save it (to ~/.config/chernion, Windows: %APPDATA%\chernion):
chernion login

# point at a different gateway:
export CHERNION_BASE_URL=https://api.chernion.ai

(Windows: setx CHERNION_API_KEY sk-chrn-....)

chernion chat

A prompt in, an answer out, streamed by default. Pass text for a one-shot, or run it bare for an interactive session; pipe a file in and it's appended to your prompt. Supports effort and attachments.

bash

chernion chat "Explain this stack trace" -m opus-4.8 -e high
cat error.log | chernion chat "what blew up?"
chernion chat -a diagram.png "what does this describe?"
chernion chat            # interactive: /model  /effort  /system  /clear  /exit

Every run ends with the exact cost, read straight off the response, e.g. · opus-4.8 · 1,204 tok · $0.001083.

chernion code

The coding endpoint from your shell: a prompt in, source out. Prints just the extracted code; --raw keeps the prose, -o writes a file.

bash

chernion code "binary search over a sorted int array" -l python
chernion code "a debounce hook" -l ts -o useDebounce.ts

chernion models

The live catalog: ids, vision, effort levels, and price per million tokens. No key required; read ids from here instead of hardcoding.

bash

chernion models
# id            vision  effort                  $in/Mtok  $out/Mtok
# opus-4.8        yes    low·medium·high·xhigh     9.00      45.00
# sonnet-4.6      yes    -                         ...

chernion models --json     # raw GET /v1/models

chernion files

Upload, list, and fetch the 7-day temp files. An upload prints an id you can hand straight to chat as an attachment, plus a shareable URL.

bash

chernion files upload report.pdf
# file_...  report.pdf  180 KB  expires in 7 days

chernion chat -a file_... "summarize the findings"

Flags & config

· -m, --model: any catalog slug (default sonnet-4.6, or your saved default).
· -e, --effort (Claude only): low · medium · high · xhigh (see Effort).
· -a, --attach <path|file_id>: repeatable; images & files (see Attachments). 8 images, 10 MiB each.
· -s, --system <text>: system prompt. --max-tokens <n>, --no-stream, --json.
· chernion config set model opus-4.8 saves a default; chernion balance shows what's left.

Agent tools

Run bare, chernion is a coding agent that reads, edits, and runs commands in your project, gated by the permission mode you set (--ask, --auto-edit, --plan, --yolo). The read tools run freely; the edit and shell tools go through the gate.

Tool	Class	What it does
read_file	read	Read a file with line numbers.
list_dir	read	List a directory (respects .gitignore).
glob	read	Find files by pattern, e.g. src/*/.ts.
grep	read	Regex search across files.
fetch_url	read	HTTP/HTTPS request from your machine; returns the response as text.
write_file	edit	Create or overwrite a file (gated).
edit_file	edit	Exact-match replace with a diff preview (gated).
download_file	edit	Download a URL into the workspace (gated).
run_bash	shell	Run a command in the workspace (gated).

Internet access (new in 0.1.1): fetch_url makes an HTTP or HTTPS request from your machine and returns the body as text, for reading web pages or calling JSON / REST APIs (it caps long text responses and flags binary content). download_file saves a URL straight into the workspace (images, fonts, audio/video, archives); the saved path is confined to the workspace, the same as the file tools.

Slash commands (REPL)

Type these inside the interactive session.

Command	What it does
/model	Switch model. No argument opens a picker.
/effort	Set reasoning effort. No argument opens a picker.
/mode	Open the permission mode picker.
/ask	Switch to ask mode (prompt before writes and shell).
/auto-edit	Apply edits without asking, still prompt for shell.
/plan	Plan mode: read only, explore and propose.
/accept	Exit plan mode and apply (moves to auto-edit).
/yolo	Yolo mode: no prompts.
/system	Set a session role. /system clear resets it.
/skills	List the skills discovered under .chernion/.
/agents	List the subagents discovered under .chernion/.
/improve	Toggle auto prompt improvement (/improve on\|off).
/tokens	Show session token usage. Aliases: /usage, /cost.
/clear	Clear chat history and memory (fresh start). Alias: /reset.
/compact	Summarize history to save context.
/login	How to update your key.
/help	Command list. Alias: /?.
/exit	Quit. Alias: /quit.

/clear (alias /reset): starts a completely fresh session. It wipes the model's conversation history and the running token/cost totals, and it clears the terminal screen and scrollback so the previous conversation is gone from view too. Use it when you want the agent to forget everything and start clean. This affects the interactive session only; it does not touch the server-backed memory used by the separate chernion chat command.

/compact: summarizes the current conversation to save context, the same idea as Claude's /compact. It makes one call to the model to condense the session into terse notes (decisions made, files touched, open tasks), then continues from that summary so the agent keeps its bearings while using far fewer tokens. Your visible scrollback is left intact for reference.

Auto prompt improvement

Before each turn, chernion refines your prompt into a clearer, self-contained instruction without changing its intent, then shows you the refined version. It is on by default and skips trivial inputs (greetings, very short messages, slash commands).

Turn it off for a session with /improve off (and back on with /improve on), or persist the choice by setting "improvePrompt": false in ~/.chernion/config.json.

Same error shapes as the API (see Errors & limits): a bad key is 401 invalid_api_key, an empty balance 402 insufficient_balance. The CLI exits non-zero and prints the code.

Skills, subagents, and rules

The CLI discovers skills, subagents, and rules from a .chernion/ folder, mirroring Claude Code's .claude/ layout. Two scopes are merged: the project scope (<repo>/.chernion/) overrides the user scope (~/.chernion/) when a name clashes.

text

.chernion/
  rules/*.md             # always-on house rules (root CHERNION.md and AGENTS.md still work too)
  skills/<name>/SKILL.md # a reusable instruction pack
  agents/<name>.md       # a specialist subagent

Skills

Skills use progressive disclosure: only each skill's name and description sit in the prompt, and the agent calls the use_skill tool to load the full body when a request matches. A skill is a folder with a SKILL.md file.

Subagents

A subagent is a Markdown file with frontmatter (name, description, and optional tools and model) and a body that becomes its system prompt. The agent delegates a self-contained subtask to it through the run_agent tool. Subagents share your permission mode and cannot spawn further subagents.

Frontmatter is a small YAML subset, for example:

markdown

---
name: landing-page
description: build a polished marketing landing page
---
1. Confirm the brand and goal.
2. Build a responsive hero.
3. Ship to ./site.

Discovery & commands

use_skill and run_agent are only offered to the model when the workspace actually defines skills or agents, so a bare workspace carries zero overhead. List what was found with the /skills and /agents slash commands.

SDKs & Tools

OpenAI SDK

python

from openai import OpenAI

client = OpenAI(base_url="https://api.chernion.ai/v1", api_key="sk-chrn-...")
client.chat.completions.create(model="sonnet-4.6", messages=[{"role": "user", "content": "Hi"}])

Anthropic SDK

python

import anthropic

client = anthropic.Anthropic(base_url="https://api.chernion.ai", api_key="sk-chrn-...")
client.messages.create(model="opus-4.8", max_tokens=1024, messages=[{"role": "user", "content": "Hi"}])

Cursor

Settings → Models → Override OpenAI Base URL https://api.chernion.ai/v1, paste your key, add a model id (e.g. sonnet-4.6).

Continue (~/.continue/config.json)

json

{
  "models": [
    {
      "title": "Chernion · Sonnet 4.6",
      "provider": "openai",
      "model": "sonnet-4.6",
      "apiBase": "https://api.chernion.ai/v1",
      "apiKey": "sk-chrn-..."
    }
  ]
}

LangChain

python

from langchain_openai import ChatOpenAI

ChatOpenAI(base_url="https://api.chernion.ai/v1", api_key="sk-chrn-...", model="sonnet-4.6")

Errors & limits

Check the HTTP status, then read the body. OpenAI routes return {"error":{"message","type","code"}}; the Anthropic route returns {"type":"error","error":{...}}.

Status	code	Meaning
400	unsupported_parameter	bad effort level, or effort on a non-Claude model
400	invalid_attachment	malformed image/file part, or image on a non-vision model
400	invalid_messages	messages array missing or malformed
401	invalid_api_key	missing / invalid / revoked key
402	insufficient_balance	top up to keep going
404	model_not_found	unknown model id
413	oversized	attachment over 10 MiB
429	rate_limited	back off; honor Retry-After
502 / 503	upstream_failed	provider hiccup; retry

Limits: 8 images per request, 10 MiB per attachment. All amounts are integer micro-USD (1 USD = 1,000,000). On 429, honor the Retry-After header.