Litepaper · v0.1
Umbra
A decentralized AI inference network where the relay can't read your prompts and workers are slashed for lying.
01 · Thesis
AI is centralizing fast. A handful of companies decide who gets access, what models will answer, and what happens to the prompts you send them. The first wave of "decentralized AI" pushed back on the model — but kept two fatal compromises:
- The operator still reads everything. Prompts are routed through one orchestrator in plaintext. The privacy claim has a hole the operators admit themselves: the relay sees your prompt "to route it."
- No one checks the work. A worker can quietly run a smaller model, return garbage, or fabricate output, and still get paid. Selection is based on raw speed, with nothing verifying that inference actually happened honestly.
Umbra closes both holes. The relay routes ciphertext it cannot open, and every worker is continuously, invisibly audited and slashed for dishonest inference. The result: private, uncensored, verified AI owned by no one.
02 · Sealed routing
Umbra uses a NaCl sealed-box (X25519 + XSalsa20-Poly1305) with a fresh ephemeral keypair per message.
- Each worker holds a long-lived box keypair and publishes only its public key to the relay.
- The client asks the relay for a worker; the relay returns the chosen worker's public key.
- The client seals the prompt — plus a one-time reply key tucked inside the sealed payload — to that public key. The relay receives
ephemeralPub ‖ nonce ‖ ciphertext and nothing else.
- The worker opens it with its secret key (which never leaves its machine), runs the model, and seals every token back to the client's one-time reply key.
The relay routes opaque bytes in both directions. It can count tokens for billing, but it cannot reconstruct a single word — even if compromised or subpoenaed. Nothing is stored; jobs live in memory and are discarded.
Honest scope: the assigned worker does decrypt your prompt to run it — that's inherent to inference. What Umbra removes is the central party that sees everything. No operator can read, log, or correlate your prompts, and the worker never learns your identity. The browser chat seals client-side; the HTTP API seals at the gateway (TLS terminus). Both are documented, neither is overstated.
03 · Verifiable inference
Privacy is worthless if the answer is fake. Umbra keeps workers honest with continuous canary audits and stake at risk.
- Stake to serve. A worker bonds $UMBRA. Honesty is backed by money it can lose.
- Indistinguishable canaries. The relay periodically sends each worker a challenge sealed exactly like real traffic — there is no "this is a test" tell. The worker can't behave only when watched, because it can't tell when it's watched.
- Greedy reference. The auditor recomputes the honest answer for the worker's claimed model at temperature 0 (deterministic) and compares byte-for-byte. A worker secretly running a cheaper model diverges and is caught.
- Slash + eject. A failed audit slashes stake and drops reputation. Below a floor, the worker is ejected from the network.
- Reputation-weighted routing. Workers are chosen by
speed × reputation with a floor, so trust — not just throughput — earns work, while new honest workers still get traffic.
- Commit-reveal. Each answer is bound by a commitment
H(jobId ‖ model ‖ salt ‖ output), so a worker can't swap its answer after the fact or copy a peer's in a cross-check.
In production the canary reference is produced by a model-attested committee rather than a single recompute, so the guarantee extends to real GPU models (whose greedy output is also deterministic per model). The demo network ships a deterministic reference backend so the full audit-and-slash loop runs with zero GPUs.
04 · Architecture
Three components, same shape as any inference network — but the trust assumptions are inverted.
Client
The web app at umbracompute.com (X login → auto Solana wallet) or any OpenAI-compatible framework via the API. Seals prompts, decrypts streamed tokens.
Relay
A stateless router: worker registry, reputation-weighted selection, a per-model queue, settlement, and the audit loop. It holds keys to nothing and stores no conversations.
Workers
- Browser (WebGPU) — serve Umbra Pro from a tab, no install.
- Native (ollama: CUDA / Metal / Vulkan) — serve Umbra Max on a real GPU.
- Image (ComfyUI + Chroma1-HD) — serve uncensored image generation.
05 · Economics
Inference is paid in credits. 1 credit = $0.01, bought with USDC. Pricing is flat per request — you know the cost before you send.
| Model | Credits | USD | Runs on |
| umbra-pro | 8 | $0.08 | browser GPUs |
| umbra-max | 12 | $0.12 | native GPUs |
| umbra-max-think | 16 | $0.16 | native GPUs |
| umbra-image | 18 | $0.18 | image GPUs |
Each job splits 70% worker / 25% treasury / 5% referral (or 80 / 15 / 5 when the worker has a staked boost). The referral cut comes out of the protocol's share, never the worker's. Credits are refunded automatically if a job fails.
06 · $UMBRA
$UMBRA is live on pump.fun.
Contract: k6DhSa48q8itLQt33v6kHdjEzf5YJAHW1xdyi9Epump
Buy on pump.fun → · Chart
You never need $UMBRA to use Umbra — inference is USDC. $UMBRA is the value-accrual layer. Two streams feed one treasury:
- Compute margin — the treasury's cut of every paid job.
- Trading fees — a share of $UMBRA market activity.
The treasury splits automatically, once a day:
| Share | Action |
| 50% | Buyback & burn — bought on the open market and permanently burned. Supply shrinks as usage grows. |
| 50% | Staker rewards — paid to everyone staking $UMBRA, in USDC. |
More usage → more buyback and bigger staker rewards. The token captures the network's growth.
07 · Staking
A single stake does three things at once, from a self-custody on-chain vault only you can withdraw from:
- Earns USDC — your proportional share of the staker-rewards half of the treasury, paid daily.
- Free daily credits — a per-staker allowance spent before any paid credits, so active stakers run real workloads without spending USDC.
- Worker boost — stake ≥ 500,000 $UMBRA (aged 24h) and your worker earns 80% instead of 70% on every job.
Deposits age 24h before they count (no sniping a payout), and unstaking pulls newest deposits first so your matured stake keeps earning. Workers stake against their own honesty — the same bond that earns rewards is the bond slashed if they fail an audit.
08 · Roadmap
- Now — sealed routing, verifiable inference + slashing, OpenAI-compatible API, browser + native workers, image generation, chat + earn apps. (this build)
- Next — on-chain settlement on Solana, model-attested canary committee for real-GPU verification, the
umbra code CLI agent, federated relays so routing itself has no single operator.
- Later — TEE/confidential-compute workers for prompt privacy even from the worker, client-side verifiable receipts, permissionless model onboarding.
Umbra is research-grade software shipped as a working network. This litepaper describes the system as built and the direction of travel; it is not financial advice and $UMBRA is a utility/value-accrual token, not a security.