Home · Blog

How Apple attestation + AMD SEV-SNP prove your AI prompts are private

June 27, 2026 · Most 'private' LLM APIs just promise not to log you. Umbra lets you cryptographically verify your prompt was never readable — by the host or the operator. Here's the attestation chain, end to end.

Every inference API says your data is “private.” Almost none let you check. “We don’t log prompts” is a policy, not a proof — you’re trusting a screenshot of a dashboard toggle. Umbra’s bet is different: you should be able to cryptographically verify that your prompt was never readable by anyone, and if you can’t verify it, you shouldn’t believe it.

Here’s the whole chain, from the silicon up.

1. The prompt is decrypted only inside a hardened process

Models run in-process (llama.cpp) on a provider’s Apple-Silicon Mac. Your prompt is re-encrypted to that specific provider’s attested key and decrypted only inside a process protected by PT_DENY_ATTACH (no debugger can attach), Apple’s Hardened Runtime (no memory inspection via Mach APIs), and SIP immutability (a reboot to strip those protections kills the process and wipes memory). It is never written to disk, never logged.

2. The machine’s owner can’t read it — and proves the hardware

The provider holds a hardware-bound key in the Secure Enclave. Each response carries an attestation chain that traces to Apple’s roots, so you can confirm you’re talking to a genuine, integrity-checked Apple device whose owner cannot extract the key or read process memory. You verify this in your browser (an in-page X.509 check) — not on our say-so.

3. The operator can’t read it either

The usual hole in “private inference” is the broker in the middle — the company routing your traffic. Umbra’s coordinator runs inside an AMD SEV-SNP confidential VM: the memory is encrypted and the VM is attested, so the operator (us) can’t snapshot RAM to read prompts in flight. The trust you extend stops at “Apple’s + AMD’s hardware did what they say,” not “trust the startup.”

4. Hardware-attested by default

Every request rides the hardware trust tier by default — the router only sends you to providers that pass the attested-hardware bar, and weaker providers never see the prompt. (A trust_level field still exists on the API for callers that want to opt down; the console no longer surfaces it because the default is the right choice for sensitive prompts.)

client.chat.completions.create(
    model="dolphin-2.9-llama3-8b",
    messages=[{"role": "user", "content": "..."}],
    # hardware attestation is the default — no extra field needed
)

Why this matters more than price (even though we’re also cheaper)

If you’re building in legal, medical, security, or finance — or any product where “we sent your users’ text to a vendor who logs it” is a non-starter — a policy promise isn’t enough. A verifiable chain is. That’s the difference between “trust us” and “check for yourself.”

Swap your base_url, keep your SDK: quickstart. Read the adversary model in the threat model.

(Umbra is an experimental alpha. Verify the claims yourself — that’s the point.)