Every inference API says your data is “private.” Almost none let you check. “We don’t log prompts” is a policy, not a proof — you’re trusting a screenshot of a dashboard toggle. Umbra’s bet is different: you should be able to cryptographically verify that your prompt was never readable by anyone, and if you can’t verify it, you shouldn’t believe it.
Here’s the whole chain, from the silicon up.
1. The prompt is decrypted only inside a hardened process
Models run in-process (llama.cpp) on a provider’s Apple-Silicon Mac. Your prompt
is re-encrypted to that specific provider’s attested key and decrypted only inside
a process protected by PT_DENY_ATTACH (no debugger can attach), Apple’s Hardened
Runtime (no memory inspection via Mach APIs), and SIP immutability (a reboot to
strip those protections kills the process and wipes memory). It is never written
to disk, never logged.
2. The machine’s owner can’t read it — and proves the hardware
The provider holds a hardware-bound key in the Secure Enclave. Each response carries an attestation chain that traces to Apple’s roots, so you can confirm you’re talking to a genuine, integrity-checked Apple device whose owner cannot extract the key or read process memory. You verify this in your browser (an in-page X.509 check) — not on our say-so.
3. The operator can’t read it either
The usual hole in “private inference” is the broker in the middle — the company routing your traffic. Umbra’s coordinator runs inside an AMD SEV-SNP confidential VM: the memory is encrypted and the VM is attested, so the operator (us) can’t snapshot RAM to read prompts in flight. The trust you extend stops at “Apple’s + AMD’s hardware did what they say,” not “trust the startup.”
4. Hardware-attested by default
Every request rides the hardware trust tier by default — the router only sends
you to providers that pass the attested-hardware bar, and weaker providers never
see the prompt. (A trust_level field still exists on the API for callers that
want to opt down; the console no longer surfaces it because the default is the
right choice for sensitive prompts.)
client.chat.completions.create(
model="dolphin-2.9-llama3-8b",
messages=[{"role": "user", "content": "..."}],
# hardware attestation is the default — no extra field needed
)
Why this matters more than price (even though we’re also cheaper)
If you’re building in legal, medical, security, or finance — or any product where “we sent your users’ text to a vendor who logs it” is a non-starter — a policy promise isn’t enough. A verifiable chain is. That’s the difference between “trust us” and “check for yourself.”
Swap your base_url, keep your SDK: quickstart. Read the
adversary model in the threat model.
(Umbra is an experimental alpha. Verify the claims yourself — that’s the point.)