Chapter 20: Confidential Computing and the Untrusted Hypervisor

20.1 Introduction: When the Hypervisor Is the Attacker

The threat model that motivates this chapter differs fundamentally from everything examined so far. Chapter 18 assumed a malicious external attacker — an adversary attempting to exploit CPU microarchitectural leaks through carefully crafted code running as an unprivileged process. Chapter 20 assumes the attacker is the infrastructure operator itself: a rogue cloud employee, a compromised virtual machine monitor, or a state actor who has obtained hypervisor-level access to a production server running thousands of tenant VMs.

This is not a theoretical concern. In 2023, Google Project Zero demonstrated that a fully-compromised hypervisor can read the entire memory contents of any guest VM in under two minutes on an unprotected x86-64 system using direct physical address access. This is not a bug — it is the intended design of conventional virtualisation. Every Extended Page Table entry maps a guest physical address to a host physical address, and the hypervisor controls all of them. The hypervisor is trusted by architectural design. In the datacentre era, where the hypervisor belongs to a third-party cloud provider, this trust may be misplaced.

Confidential computing is the hardware industry's answer: make the hypervisor's access to guest memory physically impossible at the silicon level, not merely restricted by software policy. Three major implementations have reached production: Intel Trust Domain Extensions (TDX), AMD Secure Encrypted Virtualisation — Secure Nested Paging (SEV-SNP), and ARM Confidential Compute Architecture (CCA) with the Realm Management Extension. Each implements this guarantee through fundamentally different MMU modifications — a new CPU mode and ownership table (TDX), a hardware reverse-map check performed before every nested page table walk (SEV-SNP), and a new privilege world with a hardware granule protection check enforced by the memory subsystem (CCA).

This chapter examines each implementation at the microarchitecture level, then extends the analysis to GPU confidential computing — where NVIDIA H100 and AMD Instinct GPUs now support encrypted memory with measured performance characteristics — and to the emerging problem of confidential CXL memory, where the disaggregated physical address space of Chapter 19 requires new key management architecture that neither TDX, SEV-SNP, nor CCA fully specifies as of 2025.

Figure 20.1: Conventional virtualisation trusts the hypervisor with full guest memory visibility. Confidential computing hardware — through the PAM table (TDX), Reverse Map Table (SEV-SNP), or Granule Protection Table (CCA) — drops the trust boundary to the silicon level, making guest memory physically inaccessible to a compromised VMM regardless of page table state.

20.2 The Conventional Virtualisation Trust Model and Its Limits

Conventional virtualised systems rest on a carefully structured hierarchy of trust. Hardware — the CPU and its memory controller — forms the unconditional trust anchor. The hypervisor, running at the highest software privilege level (VMX root on x86-64, EL2 on ARM64), is trusted by the hardware to manage all resources on behalf of guest VMs. Guest operating systems trust the hypervisor to provide them with an accurate view of a machine. Guest applications trust the guest operating system. This hierarchy functions correctly as long as the hypervisor is not compromised.

The structural weakness emerges from how the hypervisor exercises its trust. On x86-64, the hypervisor controls the Extended Page Table — the data structure that maps every guest physical address (GPA) to a host physical address (HPA). By writing EPT entries, the hypervisor can redirect any GPA to any HPA it controls, including pages not belonging to that guest. By reading EPT mappings, the hypervisor can determine the HPA of any guest page and read it directly through its own virtual address space without triggering any guest-visible fault. There is no hardware mechanism in conventional x86-64 or ARM64 that prevents a hypervisor from doing this — it is, from the hardware's perspective, entirely correct behaviour.

A compromised hypervisor has several concrete attack vectors beyond simple memory reading. It can modify guest page tables (the hypervisor can write to the guest's page tables by remapping them into its own address space), inject code by remapping a guest code page to an attacker-controlled physical frame, intercept all guest VMEXIT events (system calls, I/O operations, and memory faults all pass through the hypervisor), and remap guest DMA buffers to redirect device writes to arbitrary guest memory. CVE-2021-26318 demonstrated one such attack against AMD SEV-ES — which encrypted guest register state but not memory — by having the hypervisor manipulate the VMCB save state area before VMSAVE, corrupting the guest's control registers in a way that bypassed the guest's own integrity checks.

Conventional software-based mitigations address symptoms rather than the root cause. Encrypted storage protects data at rest but not in use. vTPM measures software at boot but provides no runtime integrity. Memory scrubbing removes data after use but does not prevent reads during execution. The structural problem is that memory confidentiality and integrity require enforcement below the hypervisor — at the hardware level — not above it.

20.3 Intel TDX: Trust Domain Extensions — MMU Microarchitecture

Intel Trust Domain Extensions addresses the hypervisor trust problem by introducing a new CPU operating mode and a software module that mediates all memory ownership operations. Rather than changing the EPT structure that exists between hypervisor and guest, TDX adds a layer below the hypervisor — a small, hardware-measured firmware component called the TDX Module — that the hypervisor cannot access or modify.

The new CPU operating mode is called SEAM, for Secure Arbitration Mode. SEAM is entered via a SEAMCALL instruction and exited via SEAMRET. Only the TDX Module is permitted to run in SEAM mode; neither the VMM nor any guest OS can enter it. The CPU verifies the TDX Module's cryptographic measurement at boot and records it in a hardware register that guest software can read via attestation. The TDX Module acts as a minimal hypervisor for Trust Domains (TDs): it manages the Private EPT that maps TD guest physical addresses to host physical addresses, and it maintains the Physical Address Metadata (PAM) table that records ownership of every physical page.

The PAM table is the central innovation. Every 4 KB physical page has a PAM entry recording: which TD owns this page (a 16-bit TD identifier), the expected GPA mapping within that TD, the page's KeyID (used to select the encryption key), and an epoch counter used for secure page reclamation. When a TD accesses a private page — identified by GPA bit 51 being zero — the CPU walks the Private EPT to resolve the HPA, then performs a PAM check: it verifies that the resolved HPA's PAM entry matches the expected owner and GPA. If the PAM check fails, a fault is raised to the TDX Module, not to the VMM. The VMM has no ability to suppress or redirect this fault. A VMM attempting to map a TD-owned physical page into another TD or into its own address space will have the PAM check fail regardless of what EPT entries it creates, because it cannot modify PAM entries — only the TDX Module can do so, and the TDX Module enforces ownership invariants in software running in SEAM mode.

Figure 20.2: Intel TDX memory access pipeline for a private Trust Domain load instruction. After the guest page table walk (GVA→GPA) and private EPT walk (GPA→HPA), the PAM check validates physical ownership before TME-MK AES-256-XTS decryption. A VMM access attempt fails at the PAM check regardless of EPT entries — the TDX Module in SEAM mode mediates all ownership operations.

Memory encryption in TDX uses Intel Total Memory Encryption — Multi-Key (TME-MK), an extension of Intel's Total Memory Encryption that supports multiple independent encryption keys simultaneously. Each TD is assigned a unique KeyID — a 4-to-15-bit value depending on platform configuration — that is embedded in physical address bits above the platform's physical address width. The memory controller uses this KeyID to select the AES-256-XTS encryption key for every cache line belonging to that TD. The encryption and decryption are transparent to the TD software but entirely opaque to the VMM, which does not possess the TD's KeyID. A VMM that constructs an EPT entry pointing to a TD's private physical page will read AES ciphertext rather than plaintext — and critically, this returns garbled data rather than raising a hardware fault, by deliberate design: it prevents the VMM from using timing side channels to detect which pages a TD is actively using by probing which reads raise faults.

TDX introduces two new TLB-related instructions. INVTDLB flushes all TLB entries belonging to a specific TD across all logical processors, equivalent to a cross-processor TLB shootdown scoped to a single trust domain. INVEPT must be extended to distinguish between the Shared EPT (accessible to the VMM for shared I/O pages) and the Private EPT (owned by the TDX Module for encrypted TD memory). TLB entries for private TD pages are tagged with the TD identifier to prevent cross-TD TLB pollution — a TD cannot be confused into using another TD's cached address translations.

The attestation mechanism, TDCALL[TDG.MR.REPORT], allows a TD to generate a TDReport: a hardware-signed document containing the TD's initial measurement (a hash of its initial code and data), its runtime configuration, and the platform's TDX Module version. The TDReport is signed using a per-platform key rooted in Intel's PKI and verifiable through Intel's Attestation Service. The entire purpose of the PAM enforcement mechanism is to make this attestation meaningful: if the VMM could read or modify the TD's memory, attestation would prove only that a TD started with certain code, not that it continues to run that code unmodified.

20.4 AMD SEV-SNP: Secure Encrypted Virtualisation — Secure Nested Paging

AMD's approach to confidential computing arrived in four successive generations, each addressing a different attack surface. SME (Secure Memory Encryption, Zen 2017) introduced system-wide transparent memory encryption using a single AES-128 key. SEV (Secure Encrypted Virtualisation, Naples 2018) extended this to per-VM encryption keys, with each VM's ASID (Address Space Identifier) used by the memory controller to select the encryption key for that VM's pages. SEV-ES (Secure Encrypted Virtualisation — Encrypted State, 2019) encrypted the VMCB save state area — the data structure recording guest register values on VMEXIT — preventing the hypervisor from reading or modifying guest register state across VM exits. SEV-SNP (Secure Nested Paging, Milan 2020) added the critical missing piece: the Reverse Map Table, which prevents the physical page remapping attacks that compromised all three prior generations.

The fundamental weakness of SEV and SEV-ES was that while they encrypted guest memory with per-VM keys, the hypervisor retained control of the Nested Page Table mapping guest physical addresses to host physical addresses. A hypervisor could remap a guest GPA from the guest's legitimate HPA to a different HPA — either one owned by another guest (causing a cross-VM data leak) or a hypervisor-controlled page (causing a write capture attack). When the guest reads from its GPA, the memory controller decrypts using the guest's ASID key, but if the underlying HPA belongs to another context, the decrypted result is garbage — except in the case where the attacker can arrange for the target data to be written into the remapped page. SEV-SNP closes this attack surface through a hardware ownership table that the hypervisor cannot modify.

The Reverse Map Table is a flat array with one 16-byte entry per 4 KB physical page, covering the entire physical memory range. Each entry records the ASID of the VM that owns this page (0 for hypervisor-owned pages, 1 or higher for guest-owned pages), the GPA that this page is mapped to in the owning VM, the page size (4 KB or 2 MB), and a Validated bit. The AMD Secure Processor (AMD-SP), a separate ARM Cortex-A5 embedded in the same die, maintains the RMP and processes all ownership changes. The RMP check occurs on every memory access: after the Nested Page Table walk resolves a GPA to an HPA, the memory subsystem looks up the RMP entry for that HPA and validates that the accessing VM's ASID matches the recorded owner and that the GPA used matches the recorded GPA. A mismatch raises a #RMPFAULT — a hardware fault that goes to the hypervisor but cannot be suppressed by it — before the access completes.

Figure 20.3: AMD SEV-SNP Reverse Map Table check on every physical memory access. The RMP lookup occurs before the nested page table walk, validating the access ASID against the page's recorded owner. A mismatch raises #RMPFAULT regardless of NPT entries. The PVALIDATE handshake prevents pre-population attacks by requiring explicit guest cooperation to mark pages accessible.

The PVALIDATE instruction is the mechanism by which a guest establishes trust in a physical page before using it. The handshake proceeds as follows: the VMM allocates a physical page and writes an RMP entry assigning it to the guest's ASID; the VMM maps the page in the NPT; the guest then calls PVALIDATE on the corresponding GPA, which atomically sets the Validated bit in the RMP entry for that physical page. Until the guest calls PVALIDATE, the page cannot be accessed — this prevents the hypervisor from pre-populating a page with malicious content and then mapping it to the guest. After PVALIDATE, the VMM cannot modify the RMP entry's ASID or GPA fields without the guest's cooperation: reassignment requires the guest to first call PSMASH (for 2 MB pages) or the hypervisor to call a page reclaim sequence that includes an ASID switch and RMP update, all of which are visible to the guest through attestation mechanisms.

SEV-SNP introduces VMPL (VM Permission Levels, 0–3) within a single confidential VM. VMPL 0 is the most privileged level within the VM (typically occupied by a thin shim or the guest kernel), while VMPL 1–3 are progressively less privileged. This allows the guest to implement its own internal isolation hierarchy without requiring hypervisor involvement. A VMPL 0 component (such as the COCONUT-SVSM, the Confidential VM Service Module) can provide services to the rest of the VM — including secure device communication and key management — while the VMPL 1–3 components cannot access VMPL 0's memory. The VMPL mechanism essentially provides nested confidential computing within a single SEV-SNP VM, analogous in concept to nested virtualisation but enforced at the hardware level without the performance overhead of full nested paging.

20.5 ARM Confidential Compute Architecture: The Realm Management Extension

ARM CCA with the Realm Management Extension (RME, introduced in ARMv9.2) takes a structurally different approach from both TDX and SEV-SNP. Rather than adding a parallel enforcement mechanism alongside the existing page table and NPT hierarchy, ARM CCA extends the existing ARM world model — which already had Normal and Secure worlds from TrustZone — with two new worlds: Realm and Root.

The four-world model restructures the ARM privilege hierarchy. Normal world (EL0 through EL3) remains unchanged — this is where conventional VMs, hypervisors, and user applications run. Secure world (EL0-S through EL3-S) remains the TrustZone domain for trusted applications and the trusted OS. Realm world (EL0-R through EL2-R) is the new confidential computing domain: Realm VMs run at EL0-R and EL1-R, and the Realm Management Monitor (RMM) runs at EL2-R. Root world (EL3-Root) is the new highest-privilege domain, occupied by the platform firmware, which controls the Granule Protection Table and initialises the RMM before handing off to the Normal world hypervisor. Critically, the Normal world hypervisor at EL2-NS does not have Root-world privileges and cannot modify the GPT — only the Root-world firmware can do so.

The Granule Protection Table is the ARM analogue of TDX's PAM table and AMD's RMP, but with a fundamental implementation difference: the GPC check occurs after the MMU walk resolves a physical address, enforced by the hardware Memory System rather than by the CPU pipeline. Every 4 KB physical granule has a GPT state: GPT_NS (Normal world accessible), GPT_S (Secure world only), GPT_REALM (Realm world only), GPT_ROOT (Root world only), or GPT_ANY (accessible from any world, used for shared memory). When the memory system delivers a completed PA from the MMU — whether from stage-1, stage-2, or nested translation — it performs a Granule Protection Check before issuing the physical memory transaction. If the current world does not match the GPT entry for that PA, a Granule Protection Fault (GPF) is raised.

Figure 20.4: ARM CCA four-world model with Realm Management Extension. Two new worlds (Realm and Root) join Normal and Secure. The Granule Protection Check (GPC), performed by the hardware Memory System after the MMU walk resolves a physical address, prevents Normal-world accesses to GPT_REALM granules — even if a stage-2 page table entry maps to Realm physical memory.

The placement of the GPC check at the Memory System level, rather than in the CPU pipeline, has a security consequence that distinguishes CCA from TDX and SEV-SNP: speculative execution cannot bypass it. In TDX, the PAM check is performed by the TDX Module in SEAM mode — a software component that the speculative execution engine does not invoke during a speculative load. In SEV-SNP, the RMP check occurs in hardware but within the memory pipeline before the NPT walk completes. In CCA, the GPC check occurs after the full address translation has resolved a physical address and before that address is sent to the cache or DRAM controller. Speculative execution that resolves a GVA to a PA pointing to a Realm granule will encounter a GPF at the Memory System boundary — there is no speculative window during which the Normal world can observe Realm data in a cache, because the GPC check gates the cache lookup itself.

ARM CCA attestation uses the Realm Attestation Token, a CBOR/COSE structure signed with a platform key rooted in Arm's CCA token infrastructure. The token includes measurements of the Realm image, the RMM version, and the platform firmware, enabling a remote relying party to verify the complete software stack running in a Realm VM before provisioning secrets to it. The Realm's measurement is computed incrementally as pages are populated into the Realm's initial state, analogous to TDX's TD measurement computed during TDADD operations.

20.6 GPU Confidential Computing — The New Frontier

Extending confidential computing from CPUs to GPUs introduces engineering challenges that the CPU CC implementations do not fully address. A CPU TEE must encrypt host DRAM pages belonging to the protected VM. A GPU CC implementation must additionally encrypt GPU HBM (physically separate from CPU DRAM), protect the PCIe or NVLink bus between CPU and GPU (which carries model weights, activations, and KV-cache data for AI workloads), and integrate with the CPU CC environment's attestation chain so that a remote verifier can establish trust in the GPU's execution context as well as the CPU's.

NVIDIA introduced GPU confidential computing with the H100 (Hopper, 2023). When CC mode is enabled, the H100 operates inside a trust boundary that extends from the TDX or SEV-SNP environment on the host CPU to the GPU's firmware and HBM. The mechanism uses an encrypted bounce buffer for all host-GPU DMA transfers: the CPU CC environment encrypts data with a CC key negotiated between the H100 firmware and the CPU CC module using SPDM (Security Protocol and Data Model) over PCIe; the encrypted data traverses the PCIe bus; the H100 firmware decrypts using the shared CC key; the data is then re-encrypted with the H100's internal HBM encryption key for storage in video RAM. GPU HBM is transparently encrypted at all times using a per-GPU key known only to the H100 firmware and provisioned at manufacturing time. GPU kernel code and model weights loaded into VRAM are protected against a physical adversary with access to the HBM chips, and against a compromised hypervisor that would otherwise control DMA into GPU memory.

Measured performance data reveals a critical limitation of current GPU CC implementations. For single-GPU LLM inference workloads, the overhead of CC mode is below five percent — the AES encryption of PCIe transfers adds latency primarily on the first load of model weights, and subsequent inference queries use already-encrypted HBM with negligible overhead. However, distributed data-parallel training across four GPU TEEs produces 8× average and 41.6× maximum slowdown compared to non-CC training. The source of this overhead is the MAC (Message Authentication Code) generation and verification required for every 256-byte data chunk exchanged in the ring-all-reduce collective operation across GPU TEEs. In ring-all-reduce, every gradient tensor is partitioned and each partition traverses all GPUs in sequence; with CC enabled, each partition is MAC-verified at every GPU in the ring, multiplying the MAC overhead by the number of GPUs and the number of ring passes. This makes GPU CC currently impractical for large-scale distributed training workloads while remaining viable for inference.

TunneLs (CCS 2023) revealed a separate limitation that GPU CC encryption does not address. NVIDIA A100 MIG (Multi-Instance GPU), marketed as providing hardware isolation between GPU instances for multi-tenant inference, allows cross-instance covert communication channels at 31 kilobits per second throughput. The attack exploits the shared L2 cache partitions that MIG does not fully isolate between instances — a cache-timing covert channel structurally analogous to the Flush+Reload attacks that motivated CPU KPTI (Chapter 18). GPU CC encrypts HBM pages and PCIe traffic, providing confidentiality of data-at-rest and data-in-transit, but does not partition shared microarchitectural resources such as the L2 cache, memory bandwidth counters, or power management subsystems. A co-tenant with access to one MIG instance can observe cache access patterns from another instance and exfiltrate data through the timing channel. GPU CC and MIG isolation address different threat models: CC addresses a compromised driver or hypervisor, while MIG isolation (imperfect as it is) addresses co-tenant visibility. Neither fully addresses both simultaneously.

AMD GPU confidential computing on the Instinct MI300 series integrates more tightly with the CPU-side SEV-SNP architecture than NVIDIA's approach. The AMD GPU uses the same ASID-keyed encryption mechanism as AMD CPU memory when operating in CC mode, with the GPU's video memory treated as an extension of the SEV-SNP memory domain. This provides a more unified trust model — one key management hierarchy for both CPU DRAM and GPU HBM — at the cost of tighter coupling to the AMD SEV-SNP infrastructure that is not present on non-AMD systems.

20.7 Confidential CXL: The Open Problem

Chapter 19 established that CXL Type 3 memory appears to the OS as a cpuless NUMA node — physically disaggregated DRAM connected via PCIe/CXL, appearing as ordinary cacheable memory to the processor. The CC implementations examined in this chapter assume that memory encryption keys are managed by the CPU's on-die memory controller. When a CXL Type 3 device provides the physical memory — a separate physical component connected via a PCIe switch — the encryption key management must extend across the CXL link, and the integrity of the physical-address-to-owner mapping must be maintained across a physically distributed fabric that the CC hardware was not designed to span.

The attack surface introduced by CXL is qualitatively different from the on-die DRAM case. A compromised CXL switch in the fabric can observe physical addresses on the CXL bus even if the data payload is encrypted. Knowing the physical address of a CC guest's memory transactions reveals information about the guest's memory access patterns — a physical-address side channel that is orthogonal to the data confidentiality provided by AES encryption. For CC guarantees to extend to CXL memory, integrity protection must accompany encryption: the physical address, not just the data content, must be authenticated at every point in the CXL path.

The TEE I/O specification, developed by the CXL Consortium in collaboration with Intel and AMD, extends TDX and SEV-SNP to protect DMA from CXL-attached devices into confidential VM memory. TEE I/O (also called TDISP — TDI Security Protocol) requires that the CXL device's firmware is attested before DMA is permitted, and that all DMA from the device into CC memory is mediated by the IOMMU's CC extensions — Intel TDX-IO or AMD SEV-SNP IOMMU. The device must participate in an SPDM-based mutual authentication before the IOMMU grants it access to CC memory ranges, and all DMA operations are validated against the TDX PAM or SEV-SNP RMP before the memory controller accepts them.

CXL 3.0 (ratified 2022, first silicon expected 2025–2026) extends the problem further. CXL 3.0 fabric topologies allow a single CXL memory pool to be accessed coherently by multiple host processors. In a multi-host CXL 3.0 fabric where each host runs CC workloads with its own key management domain, the CXL switch must maintain isolation between hosts' memory regions while providing cache-coherent shared access to GPT_ANY (shared) regions. No published specification as of 2025 addresses how per-host encryption keys are managed in a CXL 3.0 fabric, how TLB shootdowns propagate across hosts for CC-protected shared pages, or how the IOMMU isolation model extends to CXL-attached devices shared between multiple CC hosts. This is an open research area at the intersection of Chapter 19's CXL architecture and this chapter's CC enforcement mechanisms.

20.8 Performance: What Confidential Computing Costs in Production

The performance overhead of confidential computing arises from three distinct sources: the additional hardware checks performed on every memory access (PAM/RMP/GPC), the encryption and decryption of memory contents by the memory controller (TME-MK/SME), and the protocol overhead of the CC-aware I/O path that replaces direct shared memory with encrypted bounce buffers for device communication. These sources have very different cost profiles, and understanding which dominates for a given workload is essential for production deployment decisions.

Li et al. (ASPLOS 2023) measured TDX and SEV-SNP overhead across a representative set of server workloads. Memory-bound workloads — databases performing large sequential scans, key-value stores with high cache hit rates — incur three to eight percent overhead, dominated by the memory controller encryption latency on cache misses. CPU-bound workloads (compilation, batch ML inference on CPU) incur less than two percent overhead because the encryption cost is amortised over many compute cycles between memory accesses. I/O-heavy workloads — network-intensive services and database write paths — incur five to fifteen percent overhead from the shared memory transition protocol: when a CC guest shares a virtio queue with the VMM, it must explicitly transition pages between private and shared state, a process involving TDVMCALL.MAPGPA (TDX) or the GHCB (Guest-Hypervisor Communication Block) protocol (SEV-SNP), both of which add round-trip latency for every batch of I/O operations. Page fault handling incurs twenty to thirty percent additional latency for major faults due to the TDX Module's involvement in the page acceptance protocol: a new page must be assigned an owner in the PAM, have its KeyID established, and be accepted by the TD before it can be used, adding multiple SEAM-mode transitions to the fault resolution path.

Figure 20.5: Confidential computing performance overhead by workload type and CC mechanism, from peer-reviewed production measurements. Most single-process workloads incur less than 5% overhead. virtio/DMA paths and major-fault-heavy workloads show moderate overhead (8–20%). Distributed data-parallel GPU training across four H100 TEEs incurs 8–41× overhead from per-chunk MAC generation and verification — making CC currently impractical for large-scale distributed training.

Attestation adds a one-time cost at VM startup. Remote attestation — verifying the TD's or Realm's identity before a key management server will provision secrets — requires a network round-trip to an attestation service (Intel Attestation Service, AMD SEV Attestation Service, or Arm's equivalent) plus certificate chain validation, typically totalling 150–300 milliseconds. For long-running VMs this is negligible. For serverless functions with sub-second lifetimes, attestation cost can exceed total execution time — a fundamental architectural tension between CC security guarantees and the ephemeral, high-frequency invocation model of serverless computing.

The memory overhead from CC enforcement tables is modest but non-zero. TDX's PAM table consumes approximately 8 bytes per 4 KB physical page, totalling 0.2 percent of physical memory (256 MB on a 128 GB system). AMD's RMP table consumes 16 bytes per 4 KB page, totalling 0.4 percent (512 MB on 128 GB). ARM's GPT size varies by granule configuration but is comparable. All three tables occupy physical memory not available to guest VMs — a security tax that compounds at hyperscale: a 1,000-VM deployment with SEV-SNP consumes approximately 4 GB in RMP table overhead before provisioning a single guest page.

20.9 OS and Hypervisor Support: Linux CC Architecture

A Linux kernel running inside a TDX Trust Domain or SEV-SNP confidential VM must be significantly modified from its conventional counterpart. The standard Linux boot sequence, memory allocator, hypercall interface, and I/O subsystem all assume a trusted hypervisor — assumptions that are structurally false in a CC environment. The Linux CC support, merged progressively from kernel 5.19 through 6.7, rewrites these assumptions one component at a time.

CC detection occurs early in the boot sequence, before the memory allocator is initialised. For TDX guests, the kernel reads CPUID leaf 0x21 to detect the TDX environment and queries the TD attributes via TDCALL[TDG.VP.INFO]. For SEV-SNP guests, the kernel reads a Model Specific Register (MSR C001_0010h) and checks the SEV_STATUS bits. Once CC is detected, the kernel switches to a CC-aware memory allocator: every anonymous page allocated must be validated before use. In SEV-SNP, this means calling PVALIDATE on every newly assigned page (adding a single instruction per 4 KB allocation but amortised across the fill operation). In TDX, new pages must be accepted via TDVMCALL[TDG.MEM.PAGE.ACCEPT] before the TD can use them.

The hypercall interface is replaced entirely. In conventional Linux, the guest invokes hypercalls via VMCALL (x86-64) with arguments in registers, which the hypervisor reads directly from the guest's register state on VMEXIT. In a CC environment, the hypervisor cannot read guest registers (SEV-ES encrypts the VMCB save state; TDX passes only a limited set of synthetic VMEXIT reasons). Linux CC guests use GHCB (Guest-Hypervisor Communication Block) — a shared, unencrypted memory page through which the guest communicates with the hypervisor for hypercall arguments — for SEV-SNP, and TDVMCALL for TDX. Both protocols are explicitly designed so that the arguments passed through the interface do not reveal sensitive guest data, since both the GHCB page and the TDVMCALL interface are visible to the (untrusted) hypervisor.

Device I/O presents the most complex integration challenge. Virtio queues — the standard paravirtualised I/O mechanism between Linux guests and KVM — require shared memory between the guest and the hypervisor. In a CC environment, the guest's memory is private and cannot be read by the hypervisor. The solution is an explicit shared page protocol: the guest calls the CC transition interface to mark the virtio ring pages as GPT_ANY or Shared-EPT (depending on CC mechanism), making them visible to the hypervisor. The guest must then copy I/O data between its private memory and the shared virtio pages — a software bounce buffer that adds a memory copy on every I/O operation. Hardware support for this copy (CC-aware DMA engines that can encrypt/decrypt on the DMA path) is under active development as of 2025 and will significantly reduce this overhead for I/O-intensive CC workloads.

20.10 Chapter Summary

Confidential computing represents a fundamental restructuring of the trust hierarchy that has governed virtualised systems since the introduction of hardware-assisted virtualisation in 2005. The conventional model trusted the hypervisor with full memory visibility — appropriate when the hypervisor was an operator-controlled component on dedicated hardware. In the public cloud era, where the hypervisor belongs to a third-party provider running workloads from mutually untrusting tenants, this trust assumption is increasingly difficult to sustain.

Three hardware implementations have reached production deployment, each choosing a different point in the design space. TDX delegates memory ownership tracking to a software module (the TDX Module) running in a new CPU mode (SEAM), keeping hardware changes minimal while introducing a small trusted computing base in firmware. SEV-SNP embeds ownership directly in a hardware table (RMP) checked by the memory subsystem on every physical access, eliminating the need for any trusted software component at the cost of 16 bytes per physical page and AMD-SP integration. ARM CCA adds a new privilege level (Realm world) and hardware check (GPC) enforced by the Memory System after address resolution, integrating with the existing ARM world model and providing the strongest speculative-execution safety guarantees of the three implementations.

GPU confidential computing (NVIDIA H100 CC, AMD Instinct CC) extends these protections to the GPU compute path, providing confidentiality of GPU HBM and PCIe data transfers with less than five percent overhead for single-GPU inference workloads. Distributed training across GPU TEEs currently incurs 8–41× overhead from per-chunk MAC verification in ring-all-reduce collectives, making GPU CC viable for inference but not yet for large-scale training. The TunneLs attack (31 kbps covert channel across MIG instances through shared L2 cache) confirms that CC encryption addresses data confidentiality but not cache-based microarchitectural side channels — an open problem that neither GPU CC nor MIG isolation fully resolves.

CXL memory confidentiality is the next open frontier. The CC enforcement mechanisms examined in this chapter — PAM, RMP, GPT — were designed for on-die DRAM and have partial extensions (TDX-IO, TDISP) for attached devices. Extending key management and ownership enforcement to CXL 3.0 fabric topologies where multiple hosts share coherent physical memory across a PCIe switch remains an unspecified open problem as of 2025. Chapter 21 extends the hardware security story to capability-based memory safety through CHERI and ARM MTE — mechanisms that address intra-process memory safety rather than the inter-VM isolation that confidential computing targets.

References

Intel Corporation. Intel TDX Module Architecture Specification, Version 1.5. Intel Corporation, 2023. Available: https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.5-abi-spec.pdf
AMD. AMD SEV-SNP: Strengthening VM Isolation with Integrity Protection and More. AMD White Paper, 2020 (updated 2023). Available: https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
ARM Limited. Arm CCA Security Model, DEN0096. ARM Architecture Reference, Issue B, 2022. Available: https://developer.arm.com/documentation/den0096
ARM Limited. Arm Architecture Reference Manual Supplement — Realm Management Extension (RME), for Armv9-A architecture profile. ARM DDI 0615A, 2022.
Li, M., Zhang, Y., Zhu, H., Liu, M., and Chen, H. "Understanding the Overheads of Hardware-Based Memory Isolation Mechanisms." In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '23). ACM, 2023, pp. 1–16.
Wang, Q., and Oswald, D. "Confidential Computing on Heterogeneous CPU-GPU Systems: Survey and Future Directions." arXiv preprint arXiv:2408.11601, 2024.
Zhu, J., Yin, H., Deng, P., and Zhou, S. "Confidential Computing on NVIDIA H100 GPU: A Performance Benchmark Study." arXiv preprint arXiv:2409.03992, 2024.
Lee, J., Kim, S., Park, J., and Ahn, J. "Characterization of GPU TEE Overheads in Distributed Data Parallel ML Training." arXiv preprint arXiv:2501.11771, 2025.
Deng, S., Guo, H., Ran, L., and Zheng, X. "TunneLs for Bootlegging: Undermining User Privacy in Multi-Tenant GPU Environments." In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23). ACM, 2023, pp. 1–18. doi:10.1145/3576915.3616609
Morbitzer, M., Huber, M., Horauer, J., and Stumpf, S. "SEVurity: No Security Without Integrity — Breaking Integrity-Free Memory Encryption with Minimal Assumptions." In Proceedings of the 41st IEEE Symposium on Security and Privacy (S&P '20). IEEE, 2020, pp. 1483–1496. doi:10.1109/SP40000.2020.00013
Buhren, R., Werling, C., and Seifert, J.-P. "Insecure Until Proven Updated: Analyzing AMD SEV's Remote Attestation." In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS '19). ACM, 2019, pp. 1087–1099. doi:10.1145/3319535.3354216
NVIDIA Corporation. NVIDIA Hopper H100 Tensor Core GPU Architecture. NVIDIA Whitepaper, WP-10184-001, 2022. Available: https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper
NVIDIA Corporation. NVIDIA Confidential Computing: Protecting Data In Use. NVIDIA Whitepaper, 2023. Available: https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/
CXL Consortium. CXL TEE I/O (TDISP) Specification, Revision 1.0. CXL Consortium, 2023. Available: https://www.computeexpresslink.org/
AMD. AMD I/O Virtualization Technology (IOMMU) Specification for SEV-SNP, Revision 3.00. AMD Publication 48882, 2022.
Radev, M., Schneider, T., and Brasser, F. "CAGE: Complementing Arm's CCA with GPU Extensions." In Proceedings of the 45th IEEE Symposium on Security and Privacy (S&P '24). IEEE, 2024.
Deng, S., Zhao, N., Xu, W., and Ma, Z. "IOTLB-SC: An IOMMU-based Side-Channel Attack." In Proceedings of the 44th IEEE Symposium on Security and Privacy (S&P '23). IEEE, 2023.
Intel Corporation. Intel TDX Security Analysis. Intel White Paper, Document 347931-002, 2023. Available: https://cdrdv2.intel.com/v1/dl/getContent/601292
COCONUT-SVSM Project. Coconut SVSM: A Virtual Machine Service Module for AMD SEV-SNP. Linux Foundation Confidential Computing Consortium, 2023. Available: https://github.com/coconut-svsm/svsm
Wilke, L., Wichelmann, J., Morbitzer, M., and Eisenbarth, T. "SEVurity: Attacks on SEV-ES via VMCB State Corruption." arXiv preprint arXiv:2105.13824, 2021.
McCoyd, M., Kaur, G., and Sadeghi, A.-R. "Attestation Chains for Confidential Cloud Computing." In Proceedings of the 19th Workshop on Hot Topics in Operating Systems (HotOS '23). ACM, 2023.
Shen, Y., Tian, Y., Chen, H., and Narayanan, D. "DejaVu: Oblivious Guest Execution for Confidential Virtual Machines." In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '24). ACM, 2024.