Litepaper · v0.1

Umbra

A decentralized AI inference network where the relay can't read your prompts and workers are slashed for lying.

01 · Thesis

AI is centralizing fast. A handful of companies decide who gets access, what models will answer, and what happens to the prompts you send them. The first wave of "decentralized AI" pushed back on the model — but kept two fatal compromises:

Umbra closes both holes. The relay routes ciphertext it cannot open, and every worker is continuously, invisibly audited and slashed for dishonest inference. The result: private, uncensored, verified AI owned by no one.

02 · Sealed routing

Umbra uses a NaCl sealed-box (X25519 + XSalsa20-Poly1305) with a fresh ephemeral keypair per message.

The relay routes opaque bytes in both directions. It can count tokens for billing, but it cannot reconstruct a single word — even if compromised or subpoenaed. Nothing is stored; jobs live in memory and are discarded.

Honest scope: the assigned worker does decrypt your prompt to run it — that's inherent to inference. What Umbra removes is the central party that sees everything. No operator can read, log, or correlate your prompts, and the worker never learns your identity. The browser chat seals client-side; the HTTP API seals at the gateway (TLS terminus). Both are documented, neither is overstated.

03 · Verifiable inference

Privacy is worthless if the answer is fake. Umbra keeps workers honest with continuous canary audits and stake at risk.

In production the canary reference is produced by a model-attested committee rather than a single recompute, so the guarantee extends to real GPU models (whose greedy output is also deterministic per model). The demo network ships a deterministic reference backend so the full audit-and-slash loop runs with zero GPUs.

04 · Architecture

Three components, same shape as any inference network — but the trust assumptions are inverted.

Client

The web app at umbracompute.com (X login → auto Solana wallet) or any OpenAI-compatible framework via the API. Seals prompts, decrypts streamed tokens.

Relay

A stateless router: worker registry, reputation-weighted selection, a per-model queue, settlement, and the audit loop. It holds keys to nothing and stores no conversations.

Workers

05 · Economics

Inference is paid in credits. 1 credit = $0.01, bought with USDC. Pricing is flat per request — you know the cost before you send.

ModelCreditsUSDRuns on
umbra-pro8$0.08browser GPUs
umbra-max12$0.12native GPUs
umbra-max-think16$0.16native GPUs
umbra-image18$0.18image GPUs

Each job splits 70% worker / 25% treasury / 5% referral (or 80 / 15 / 5 when the worker has a staked boost). The referral cut comes out of the protocol's share, never the worker's. Credits are refunded automatically if a job fails.

06 · $UMBRA

$UMBRA is live on pump.fun.
Contract: k6DhSa48q8itLQt33v6kHdjEzf5YJAHW1xdyi9Epump
Buy on pump.fun →  ·  Chart

You never need $UMBRA to use Umbra — inference is USDC. $UMBRA is the value-accrual layer. Two streams feed one treasury:

The treasury splits automatically, once a day:

ShareAction
50%Buyback & burn — bought on the open market and permanently burned. Supply shrinks as usage grows.
50%Staker rewards — paid to everyone staking $UMBRA, in USDC.

More usage → more buyback and bigger staker rewards. The token captures the network's growth.

07 · Staking

A single stake does three things at once, from a self-custody on-chain vault only you can withdraw from:

Deposits age 24h before they count (no sniping a payout), and unstaking pulls newest deposits first so your matured stake keeps earning. Workers stake against their own honesty — the same bond that earns rewards is the bond slashed if they fail an audit.

08 · Roadmap

Umbra is research-grade software shipped as a working network. This litepaper describes the system as built and the direction of travel; it is not financial advice and $UMBRA is a utility/value-accrual token, not a security.