Chapter 21: Hardware Memory Safety — CHERI, MTE, and Capability-Based Addressing

21.1 Introduction: The Memory Safety Gap Below the Page Boundary

The MMU, as examined across the preceding twenty chapters, provides memory isolation at page granularity. A process cannot access another process's pages; a guest VM cannot access host memory outside its EPT mappings. This page-level isolation is the foundation of contemporary operating system security and the mechanism that makes virtualisation, containers, and confidential computing tractable. It is, however, insufficient by an order of magnitude.

Within a single page — within a single allocated object — the MMU provides no protection whatsoever. A C pointer that overflows a 64-byte stack buffer by one byte will access memory that is mapped, readable, and writable by the same process. The MMU will not fault. The TLB will not fault. The memory controller will not fault. The access will succeed silently, corrupting whatever data lies adjacent in memory. This is not a corner case: approximately 70% of all CVEs assigned annually by Microsoft and approximately 70% of high-severity bugs in the Chromium browser are memory safety violations of exactly this class, as reported in 2019 by Microsoft's Security Response Center and Google's Chromium security team.

This chapter examines the hardware mechanisms designed to close this gap by moving enforcement below the page boundary and into the pointer itself. Two complementary approaches have reached production silicon: memory tagging, which associates a small lock value with each memory granule and checks the key in every pointer that accesses it, and capability-based addressing, which encodes precise bounds and permissions directly in a fat pointer representation that the hardware validates on every load and store.

The chapter builds directly on Chapters 20 and 18. Chapter 20 established that confidential computing moves the trust boundary from the hypervisor downward to the silicon; Chapter 21 moves the protection boundary further inward, from page granularity to object granularity. Chapter 18 demonstrated that speculative execution can bypass software-enforced security policies; this chapter identifies where hardware memory tagging is similarly vulnerable to speculative bypass and where capability-based enforcement is not.

Figure 21.1: Memory safety violation taxonomy with CVE distribution (2015–2022). Spatial violations (44%) arise from pointer arithmetic beyond object bounds; temporal violations (49%) arise from accessing memory after its lifetime ends. Hardware mechanisms address these at different granularities: CHERI provides exact bounds per pointer; MTE provides probabilistic 4-bit tag checking per 16-byte granule. Software-only mitigations (ASan, SoftBound, Intel MPX) have proven too expensive for production deployment.

21.2 A Taxonomy of Memory Safety Violations

Memory safety violations fall into four categories with distinct hardware mitigation requirements. Spatial safety violations (44% of reported CVEs) occur when a pointer accesses memory outside the object it was originally allocated to point into. The canonical example is a buffer overflow — a write past the end of a stack array, a heap buffer, or a global variable — but the category also includes one-byte underflows, out-of-bounds reads (as in Heartbleed, CVE-2014-0160), and arbitrary pointer arithmetic that constructs a pointer to a different object entirely. Spatial violations are caused by the absence of any bounds information in conventional pointer representations: a C pointer is simply a machine-word integer, and the hardware has no record of what region it was originally derived from.

Temporal safety violations (49% of CVEs) occur when a pointer is used after the object it references has been freed or has gone out of scope. Use-after-free vulnerabilities — where a freed heap block is reallocated for a different purpose and a stale pointer to the original allocation is subsequently dereferenced — account for approximately half of all Chrome security bugs. Double-free, where the same pointer is passed to free() twice, and dangling-pointer dereference on stack objects after function return, are closely related. Temporal violations are fundamentally a lifetime problem: the hardware has no way to know that a pointer's target has been freed, because pointer values are integers and free() is a library call with no hardware visibility.

Type confusion violations (11%) arise when a pointer to one type is cast or reinterpreted as a pointer to a different type, enabling reads or writes through the wrong type's field layout. Uninitialised memory accesses (9%) read memory before it has been written, exposing prior heap or stack contents. Both categories are less well served by current hardware mechanisms than spatial and temporal violations, though CHERI's LoadCap and StoreCap permissions provide partial mitigation for type confusion involving capability-typed fields.

This breakdown, derived from Sutter's analysis of CVEs from 2015 to 2022, is important for understanding what hardware mechanisms actually protect against. ARM MTE addresses spatial and temporal violations probabilistically. CHERI addresses spatial violations precisely and temporal violations through a pointer revocation mechanism. Neither mechanism addresses type confusion or uninitialised memory reads in the general case, though research extensions such as Mon CHÉRI (arXiv:2407.08663) demonstrate that conditional capabilities can detect uninitialised accesses at approximately 3.5% overhead on CHERI.

21.3 Software Mitigations and Their Limits

AddressSanitizer (ASan), developed at Google in 2012, is the most widely used software memory safety tool. ASan instruments every memory access at compile time to check a shadow memory map recording which bytes are valid; it uses red zones around allocations to detect overflows and quarantine lists to detect use-after-free. The overhead is 1.5× to 5× CPU slowdown and approximately 1.5× memory overhead, which places it firmly in the testing and debugging category. Production deployments of ASan in shipped code are essentially non-existent for latency-sensitive services.

SoftBound+CETS, a research system combining spatial bounds checking via fat-pointer metadata tables (SoftBound) with temporal safety via a capability timestamp scheme (CETS), achieves correct spatial and temporal safety for C programs at approximately 116% overhead for the SPEC CPU2006 integer suite. This is comprehensive but approximately 5× more expensive than ASan, making it even less suitable for production. CCured, an earlier type-system approach from Berkeley and Microsoft Research, achieved lower overhead but was not thread-safe due to non-atomic pointer access and required source annotation for legacy code.

Intel Memory Protection Extensions (MPX), introduced in Skylake (2015) and deprecated and removed in 2019, represents the hardware industry's first serious attempt to provide bounds checking in commodity x86-64 silicon. MPX added four bound registers (BND0–BND3) and BNDMOV/BNDCL/BNDCU instructions to check bounds against them. The 2018 analysis by Oleksenko, Kuvaiskii, Bhatotia, Felber, and Fetzer at ASPLOS identifies the root causes of MPX's failure: 10–50% overhead on pointer-heavy code, inability to protect pointers passed by value on the stack, no support for bounds propagation through memcpy, significant ABI compatibility issues with unmodified libraries, and, crucially, no OS adoption. GCC, the Linux kernel, and glibc never shipped with MPX enabled by default. Compiler support was removed from GCC 9.1 in 2019. MPX is the field's primary cautionary example of what hardware bounds checking must not do, and it directly shaped CHERI's design decisions — particularly the embedding of bounds in the pointer itself rather than in separate registers.

21.4 Memory Tagging: Lock-and-Key Protection

Memory tagging is the conceptually simplest hardware approach to memory safety: associate a small metadata value — a tag — with each granule of physical memory, store a matching tag in every pointer that accesses that granule, and raise a fault when the tags do not match. Spatial safety is enforced because a pointer to a different object will carry a different tag, and the allocation-adjacent granule will have a different memory tag. Temporal safety is probabilistic: when an object is freed and reallocated, the allocator assigns a new tag; a stale pointer from the freed allocation will carry the old tag, which will mismatch with the new allocation's tag with probability 15/16.

The first production hardware memory tagging implementation was SPARC Application Data Integrity (ADI), introduced in the Oracle SPARC M7 processor in 2015. SPARC ADI uses 4-bit tags on 64-byte granules (matching the cacheline size), with the pointer's 4-bit version tag in bits [63:60] of the virtual address. The MMU checks the TTE.mcd (Memory Corruption Detection) bit per TLB entry — if set, the translation compares the pointer version tag against the physical memory tag stored in a DRAM-resident tag array. ADI operates in precise mode (exception at the faulting instruction) or imprecise mode (exception deferred). Google used SPARC ADI as an early production exploration of hardware memory tagging before ARM MTE silicon was available, as described by Serebryany in the 2018 foundational paper that defined the tag size (TS) and tag granule (TG) notation used across all subsequent tagged-memory research.

The key hardware design choice is where in the virtual address to store the pointer tag. SPARC ADI uses bits [63:60]; ARM MTE uses bits [59:56]. Both exploit the fact that 64-bit processors implement fewer than 64 bits of virtual address space, leaving upper bits available for metadata storage. The ARM architecture formalises this through Top Byte Ignore (TBI): bits [63:56] of a virtual address are stripped before the MMU uses the address for translation. This means a pointer with tag 0x3 and a pointer with tag 0x7 to the same physical location both translate to the same physical address — the MMU never sees the tag bits at all. Tag comparison is a completely separate hardware path in the memory subsystem.

Figure 21.2: ARM MTE lock-and-key memory tagging mechanism and its interaction with the MMU. The 4-bit pointer tag occupies bits [59:56] of the virtual address in the Top Byte Ignore region. The MMU controls whether a page participates in MTE via PROT_MTE, but does not store tag values — these reside in a DRAM-resident tag array accessed by the memory subsystem after MMU translation. The TikTag attack (arXiv:2406.08719) demonstrated SYNC mode bypass via speculative execution on Pixel 8 Cortex-X3 in 2024.

21.5 ARM MTE: Architecture, Modes, and MMU Interaction

ARM Memory Tagging Extension was introduced in ARMv8.5-A (announced August 2019) and first appeared in production Cortex-A78 and Cortex-X1 cores implementing the ARMv9 architecture in 2021. MTE assigns a 4-bit tag to each 16-byte aligned granule of memory. The pointer's 4-bit tag occupies bits [59:56] of the 64-bit virtual address. On every load or store, hardware compares the pointer tag against the memory tag; a mismatch causes a Tag Check Fault delivered as SIGSEGV to the faulting process.

The MMU's role in MTE is enabling and scoping, not enforcement. A page participates in MTE only if it is mapped with PROT_MTE (set through mprotect()) and the system-wide SCTLR_EL1.TCF field (or SCTLR_EL1.TCF0 for EL0 accesses) enables tag checking. The TLB entry records whether a page has MTE enabled — but stores no tag values. Tag values live in a DRAM-resident tag array: 4 bits for each 16-byte granule, stored outside the normal data region. The hardware maintains a tag cache (separate from the L1/L2 data caches) that is populated on cache misses. This is the primary source of MTE overhead: on tag cache misses, the hardware must load both the data and the tag from DRAM. The 2026 extended performance analysis by Jiang et al. (arXiv:2601.11786) identifies this tag cache miss pattern as the principal microarchitectural cause of MTE overhead and provides a systematic correction of errors in prior measurements.

MTE offers three operating modes. Synchronous mode (SYNC) raises the fault at the exact instruction that causes the mismatch, providing full diagnostic information. This mode is suitable for debug builds and acts as a deterministic security mitigation at the cost of 5–10% overhead on memory-intensive workloads. Asynchronous mode (ASYNC) defers fault delivery to a background status register, incurring approximately 1–3% overhead, and is the mode used in production deployments — Google enabled MTE ASYNC in Android's Scudo allocator for the Pixel 8 (2023) as the first mass-market deployment. Asymmetric mode (ASYMM, ARMv9.4-A) makes writes synchronous and reads asynchronous, balancing diagnostic accuracy against overhead for write-dominated workloads.

The AmpereOne datacenter processor, launched in 2024, is the first server-class MTE implementation. Ampere's engineering team achieved a notable result: zero memory capacity overhead for tag storage through a novel hardware design that eliminates the separate DRAM tag array, and single-digit percentage overhead for SYNC mode across a broad range of datacenter workloads. Their 2025 analysis identifies application-level memory management — particularly excessive allocation/deallocation frequency — as the primary remaining overhead source, providing clear optimization targets for allocators and allocating-heavy frameworks.

A critical security limitation was demonstrated by Kim et al. in the TikTag attack (arXiv:2406.08719, 2024). On the Cortex-X3 in the Google Pixel 8, MTE synchronous mode can be bypassed via speculative execution: a speculative load before the tag check fault is delivered can carry tag information into a microarchitectural side channel, allowing an attacker to construct a tag oracle — a mechanism that reveals whether two addresses share the same tag without triggering a fault. This is structurally similar to the speculative bypasses examined in Chapter 18, applied to MTE's fault delivery rather than to page permission checks. The Android Security Team acknowledged the issue as a hardware flaw and awarded a bug bounty; the countermeasure requires software changes to MTE-based defences rather than silicon errata.

21.6 CHERI Capabilities: Fat Pointers with Hardware Enforcement

Capability Hardware Enhanced RISC Instructions is a joint project of SRI International and the University of Cambridge, initiated around 2010 and producing the longest-running hardware capability ISA in contemporary computer architecture research. The project has produced nine published ISA versions, three ISA instantiations (CHERI-MIPS, CHERI-RISC-V, and CHERI-ARMv8-A as Morello), and an extensive formal verification corpus. The most recent specification, CHERI ISA Version 9 (Watson et al., UCAM-CL-TR-987, 2023), defines the current canonical architecture.

A CHERI capability is a 128-bit fat pointer on 64-bit platforms. Unlike memory tagging, which associates metadata with the memory granule and checks a small key in the pointer, CHERI encodes the complete provenance of a pointer directly in the pointer itself. Every capability carries: a validity tag (1 bit, stored in a separate tag SRAM per cache line), a permissions bitmask (Load, Store, Execute, LoadCap, StoreCap, Seal, and others), an object type field (used for sealed capabilities in cross-domain calls), and bounds information encoding the base address and top address of the region this pointer is permitted to access.

The validity tag is the security-critical invariant. It lives outside the 128-bit data payload in a dedicated per-cache-line SRAM. Hardware clears the tag on any store that does not use a capability-aware instruction — any integer write to a capability register clears the tag, making the capability invalid. Software cannot set the tag to 1 directly. The only way to hold a valid capability is to derive it from an existing valid capability, with the hardware enforcing monotonic attenuation: a capability can have its bounds narrowed or permissions removed, but never expanded. This makes CHERI capabilities unforgeable from arbitrary integers — a property that memory tagging's 4-bit tag does not provide.

Capability checks are performed in the CPU execute stage, before the virtual address is sent to the MMU for translation. When a load or store instruction executes with a capability register as the base address, the hardware performs four checks in sequence: validity tag set, address at or above the capability base, address plus access size at or below the capability top, and permission bitmask authorises the operation type. Any failure raises a CHERI capability exception. This occurs before the MMU walk begins — a CHERI fault can fire on a virtual address that is validly mapped and accessible at the page-level, because CHERI and the MMU protect at fundamentally different granularities.

Figure 21.3: CHERI 128-bit capability structure. The 1-bit validity tag lives in a separate tag SRAM per cache line, outside the 128-bit data payload. The bounds field uses CHERI Concentrate compression (mantissa/exponent encoding) enabling sub-object precision. The permissions bitmask provides fine-grained control beyond R/W/X. Capability checks occur in the CPU execute stage before MMU translation — a CHERI fault can fire on a validly-mapped page. Software cannot set the tag bit through integer stores, making capabilities unforgeable.

21.7 CHERI ISA: Bounds Compression, Permissions, and Sealing

The bounds field is the most technically challenging component of the CHERI capability. Encoding a full base address and full top address in a 128-bit capability alongside a 64-bit address value would require 192 bits — exceeding the architectural register width. CHERI Concentrate, described by Woodruff et al. in IEEE Transactions on Computers 2019, uses a floating-point-style mantissa/exponent encoding that compresses bounds to approximately 87 bits. The base and top are expressed as (exponent, mantissa) pairs where the exponent specifies the least significant bit of precision and the mantissa provides the significant digits. This representation constrains the alignment of capability bounds: for very large objects or very small objects near alignment boundaries, the bounds may be rounded outward to the nearest representable value. The alignment constraint is typically a few bytes for small objects and entirely imperceptible for large ones. The 2020 Cornucopia paper quantifies this rounding behaviour and confirms it does not materially weaken spatial safety for practical allocations.

The permissions bitmask provides a richer access control model than the R/W/X of page table entries. The LoadCap and StoreCap permissions control whether a capability can be used to load or store other capabilities from memory — a critical distinction because it prevents untrusted code from reading capabilities out of memory and using them to escape its sandbox. The Seal permission enables the creation of sealed capabilities, which have their object type field set to a non-zero value; sealed capabilities cannot be used for loads or stores but can be passed as unforgeable tokens for cross-domain calls. This enables the CCall/CCReturn mechanism for safe cross-compartment procedure calls that enforces a well-defined interface and prevents privilege escalation through the call boundary.

In hybrid mode — the incremental adoption path — only selected pointers are capabilities; the rest remain conventional integers. The default data capability (DDC) register provides implicit authority for legacy integer-addressed loads and stores, allowing CHERI to coexist with unmodified libraries. In pure-capability (purecap) mode, all pointers in the entire software stack are capabilities, providing complete spatial safety and enabling the OS to enforce compartmentalization at library and process boundaries with hardware backing. The performance difference is significant: hybrid mode on Morello incurs 2–10% overhead while purecap incurs 10–50% on the SPEC CPU2006 integer suite, as characterised in the Cambridge early Morello performance report (UCAM-CL-TR-986, 2023).

21.8 Morello, CheriBSD, and CHERIoT

Arm's Morello is a 2.5 GHz four-core superscalar ARMv8-A processor incorporating full CHERI support, produced as experimental silicon in 2022 for the UK Digital Security by Design (DSbD) programme — a £187M Industrial Strategy Challenge Fund initiative. Morello is the first high-performance CHERI silicon, enabling evaluation of CHERI at realistic computational complexity rather than in FPGA soft-core environments. The IEEE Micro paper by Grisenthwaite, Barnes, Watson, Moore, Sewell, and Woodruff (2023) describes the Morello SoC design, the integration of CHERI capability checking into the Neoverse N1 microarchitecture, and initial benchmark results. Morello implements both hybrid and purecap modes; compartmentalisation via library-level sandboxing is a primary evaluation target alongside raw performance.

CheriBSD is the reference operating system for CHERI, derived from FreeBSD. It supports both modes and is the only fully functional CHERI OS with complete kernel capability support. A 2026 security analysis by Guzairov, Potanin, Kell, and Tiu (arXiv:2601.19074) identifies four compartmentalisation bypass vectors in CheriBSD and Morello Linux, demonstrating that hardware capability bounds enforcement does not automatically imply secure compartmentalisation at the software level. The findings are important calibration for expectations: CHERI prevents arbitrary memory corruption but cannot prevent information leakage through shared-memory interfaces, explicitly-passed capabilities, or shared mutable state that was designed to be cross-compartment-accessible.

CHERIoT adapts CHERI for resource-constrained embedded and IoT cores in a collaboration between Microsoft Research and the University of Cambridge. The MICRO 2023 paper by Amar, Chisnall, Chen, Filardo, Laurie, Liu, Norton, Moore, Tao, Watson, and Xia achieves complete spatial and temporal memory safety at 7–15% area overhead on a minimal RISC-V core. CHERIoT introduces several ISA revisions: a sentry mechanism for safe cross-compartment calls without a separate kernel, a revised capability permission ontology tailored to embedded use cases, and a temporal safety scheme that revokes capabilities at object granularity without a full heap scan. The CHERIoT RTOS is available as an open-source implementation at github.com/microsoft/CHERIoT-RTOS. CHERIoT is currently the most credible path to CHERI deployment in production embedded silicon.

For temporal safety on heap allocations, CHERIvoke (Xia et al., MICRO 2019) characterises the pointer revocation problem: after free(), all outstanding capabilities pointing into the freed region must have their validity tags cleared before the memory is reallocated. This requires a sweep of the heap to find and clear stale capabilities, adding approximately 2–5% overhead with sweeps triggered every ~100 milliseconds for typical server workloads. Cornucopia (Filardo et al., IEEE S&P 2020) reduces this to below 1% through a lazy revocation scheme using a quarantine sweeper and CHERI Concentrate's alignment properties to batch revocations efficiently.

21.9 CHERI and the MMU: Two Orthogonal Gates

The relationship between CHERI capabilities and the conventional MMU is the structural question that places this chapter within the book's larger MMU/TLB narrative. CHERI and the MMU are orthogonal mechanisms protecting at different granularities through different pipeline stages on the same virtual address space. Both must pass for an access to succeed; neither subsumes the other.

The MMU provides page-granularity isolation between address spaces. It ensures that a process cannot access another process's pages, that a guest VM cannot access host memory outside its EPT, that kernel pages are inaccessible from user mode. These protections are enforced through the page table walk and TLB lookup that translate virtual addresses to physical addresses. The MMU's unit of protection is the page (typically 4 KB); it has no information about allocation boundaries within a page.

CHERI provides sub-page, per-pointer protection within a single address space. Its checks occur in the CPU execute stage before the virtual address reaches the MMU — a CHERI capability exception can fire on an access to a validly-mapped, writable page if the capability's bounds do not encompass the target address. CHERI's unit of protection is the capability-bounded region, which can be as small as a single byte of a struct field. The MMU and TLB are unchanged by CHERI: adding CHERI to an existing MMU-equipped core requires no modification to the TLB, the page table walker, or the physical memory controller.

Figure 21.4: CHERI and the MMU operate as two independent protection gates. Gate 1 (CHERI, CPU execute stage) validates tag, bounds, and permissions before the virtual address reaches the MMU. Gate 2 (MMU, TLB) validates page presence, permissions, and MTE tags after translation. Both must pass independently. Three genuine interactions exist: shared VA space requiring coordinated OS context switches; tag SRAM preservation during page swap (CheriBSD extends the kernel page fault path); and CHERI-without-MMU for embedded systems (arXiv:2310.00933).

Three genuine interactions exist between CHERI and the MMU at the OS level. First, capabilities encode virtual addresses and are scoped to a virtual address space in exactly the same way page table mappings are; on a context switch, the OS must manage both the page table root (CR3 on x86-64, TTBR0/TTBR1 on ARM64) and the capability register file as a coherent unit. Second, when the OS evicts a page to swap storage, the tag SRAM bits associated with that page's cache lines must be saved and restored alongside the data. CheriBSD extends the page fault handler and swap paths to handle this, but it represents a non-trivial kernel engineering obligation that any CHERI-supporting OS must address. Third, CHERI can operate as a standalone memory protection mechanism on systems without a conventional MMU: the case of securing MMU-less Linux using CHERI (Almatary et al., arXiv:2310.00933) demonstrates that CHERI provides spatial safety and compartmentalisation independently of page-based isolation, which is particularly relevant for the embedded domain where many MCUs lack an MMU.

Figure 21.5: Hardware memory safety mechanism deployment timeline and overhead comparison. SPARC M7 ADI (2015) was the first production tagged-memory silicon. ARM MTE reached production in Pixel 8 (2023) and the AmpereOne datacenter processor (2024). CHERI-Morello (2022) is the first high-performance capability silicon, with hybrid mode reducing overhead to 2–10%. CHERIoT (MICRO 2023) achieves complete memory safety for embedded cores at 7–15% area overhead. Intel MPX (2015–2019) is the cautionary tale: deprecated after producing 10–50% overhead with no OS adoption.

21.10 Chapter Summary

Hardware memory safety mechanisms address the structural gap that the MMU cannot close: within a single page, within a single process's address space, the MMU has no mechanism to prevent a pointer from accessing memory outside the object it was allocated for. This gap accounts for approximately 70% of CVEs across major codebases — a figure that has been stable for a decade despite significant investment in software mitigations, reflecting the fundamental inadequacy of tools like AddressSanitizer (2–5× overhead), SoftBound+CETS (116% overhead), and Intel MPX (10–50%, deprecated 2019) for production deployment.

Two complementary hardware approaches have reached production silicon. Memory tagging assigns a small lock value to each memory granule and stores a matching key in the pointer. SPARC ADI (Oracle SPARC M7, 2015) was the first deployed implementation. ARM MTE (ARMv8.5-A, Pixel 8 2023, AmpereOne 2024) assigns 4 bits per 16-byte granule, stores the pointer tag in bits [59:56] of the virtual address using the Top Byte Ignore mechanism, and checks the tag in the memory subsystem after MMU translation. The MMU controls whether a page participates in MTE via PROT_MTE in the page table entry but stores no tag values — these reside in a DRAM-resident tag array accessed independently of the TLB. MTE ASYNC overhead is 1–3% in production; the TikTag speculative execution bypass (arXiv:2406.08719, 2024) demonstrated that SYNC mode can be defeated via side channel on Cortex-X3.

CHERI encodes precise bounds and permissions directly in a 128-bit fat pointer. The validity tag in a per-cache-line SRAM prevents capability forgery from integers. Capability checks occur in the CPU execute stage before MMU translation — CHERI and the MMU are orthogonal gates that both must be passed independently. The Arm Morello processor (2022) is the first high-performance CHERI silicon; CheriBSD provides the reference OS with hybrid and pure-capability modes. CHERIoT (MICRO 2023, Microsoft Research and Cambridge) achieves complete spatial and temporal memory safety at 7–15% area overhead for embedded cores. Temporal safety for heap allocations requires pointer revocation; CHERIvoke characterises the overhead and Cornucopia reduces it below 1% through lazy sweeping.

The architectural relationship between these mechanisms and the MMU is precise and important. Memory tagging extends the MMU's page-attribute model with a per-granule lock stored outside the TLB. CHERI operates entirely outside the MMU pipeline, adding a pre-translation gate in the execute stage. Both require OS-level support for tag preservation through page eviction and for coherent management alongside the virtual address space. Chapter 22 extends this analysis to RISC-V's H-extension virtualisation, where the interaction between the two-level address translation hierarchy and memory safety mechanisms in hosted VMs raises new open questions.

References

Watson, R.N.M., Woodruff, J., Neumann, P.G., Moore, S.W., Anderson, J., Chisnall, D., Davis, B., Laurie, B., Roe, M., et al. (2015). CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization. Proceedings of IEEE Symposium on Security and Privacy (S&P 2015), pp. 20–37. doi:10.1109/SP.2015.9
Woodruff, J., Watson, R.N.M., Chisnall, D., Moore, S.W., Anderson, J., Davis, B., Laurie, B., Neumann, P.G., Norton, R., and Roe, M. (2014). The CHERI Capability Model: Revisiting RISC in an Age of Risk. Proceedings of ISCA 2014, pp. 457–468. doi:10.1145/2678373.2665740
Watson, R.N.M., Neumann, P.G., Woodruff, J., et al. (2023). Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture (Version 9). University of Cambridge Technical Report UCAM-CL-TR-987.
Grisenthwaite, R., Barnes, G., Watson, R.N.M., Moore, S.W., Sewell, P., and Woodruff, J. (2023). The Arm Morello Evaluation Platform — Validating CHERI-Based Security in a High-Performance System. IEEE Micro, 43(3), pp. 50–57. doi:10.1109/MM.2023.3264676
Amar, S., Chisnall, D., Chen, T., Filardo, N.W., Laurie, B., Liu, K., Norton, R., Moore, S.W., Tao, Y., Watson, R.N.M., and Xia, H. (2023). CHERIoT: Complete Memory Safety for Embedded Devices. Proceedings of MICRO '23, pp. 641–653. doi:10.1145/3613424.3614266
Serebryany, K. (2018). Memory Tagging and How It Improves C/C++ Memory Safety. arXiv preprint arXiv:1802.09517. [Google; foundational memory tagging comparison paper; defines TS/TG notation]
Arm Limited. (2021). Armv8.5-A Memory Tagging Extension White Paper. ARM-ECM-0854461.
Oleksenko, O., Kuvaiskii, D., Bhatotia, P., Felber, P., and Fetzer, C. (2018). Intel MPX Explained: A Cross-layer Analysis of the Intel MPX System Stack. Proceedings of ASPLOS 2018. doi:10.1145/3173162.3173202
Filardo, N.W., Gutstein, B.F., Woodruff, J., et al. (2020). Cornucopia: Temporal Safety for CHERI Heaps. Proceedings of IEEE S&P 2020, pp. 608–625. doi:10.1109/SP40000.2020.00098
Xia, H., Clarke, J., Chisnall, D., et al. (2019). CHERIvoke: Characterising Pointer Revocation using CHERI Capabilities for Temporal Memory Safety. Proceedings of MICRO 2019. doi:10.1145/3352460.3358288
Chisnall, D., Rothwell, C., Watson, R.N.M., et al. (2015). Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine. Proceedings of ASPLOS 2015. doi:10.1145/2694344.2694367
Aingaran, K., Jairath, S., Konstadinidis, G., et al. (2015). M7: Oracle's Next-Generation SPARC Processor. IEEE Micro, 35(2), pp. 36–46. doi:10.1109/MM.2015.35
Armstrong, A., Bauereiss, T., Campbell, B., Reid, A., et al. (2019). ISA Semantics for ARMv8-A, RISC-V, and CHERI-MIPS. Proceedings of POPL 2019, pp. 1–31. doi:10.1145/3290384
Jiang, Z., et al. (2026). ARM MTE Performance in Practice (Extended Version). arXiv preprint arXiv:2601.11786. [Systematic MTE overhead measurement; corrects methodological errors in prior work]
Ampere Computing. (2025). Optimised Memory Tagging on AmpereOne Processors. arXiv preprint arXiv:2511.17773. [First datacenter MTE processor; zero tag storage capacity overhead; single-digit % SYNC]
Kim, J., et al. (2024). TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution. arXiv preprint arXiv:2406.08719. [Speculative bypass of MTE SYNC on Pixel 8 Cortex-X3; Android Security bug bounty]
Guzairov, D., Potanin, A., Kell, S., and Tiu, A. (2026). A Security Analysis of CheriBSD and Morello Linux. arXiv preprint arXiv:2601.19074. [Four compartmentalisation bypass vectors in CHERI OS software]
Bauereiss, T., et al. (2024). VeriCHERI: Exhaustive Formal Security Verification of CHERI at the RTL. arXiv preprint arXiv:2407.18679. [RTL-level formal proof of CHERI security properties in Morello hardware]
Watson, R.N.M., Clarke, J., Sewell, P., et al. (2023). Early Performance Results from the Prototype Morello Microarchitecture. University of Cambridge Technical Report UCAM-CL-TR-986.
Almatary, H., et al. (2023). Securing MMU-less Linux Using CHERI. arXiv preprint arXiv:2310.00933. [CHERI as standalone protection in embedded Linux without MMU]
Microsoft Security Response Center. (2019). A Proactive Approach to More Secure Code. MSRC Blog. msrc.microsoft.com/blog/2019/07. [~70% CVE figure for memory safety]
CISA. (2023). The Urgent Need for Memory Safety in Software Products. cisa.gov. [Joint advisory with NSA, NCSC-UK citing 70% figures and recommending hardware enforcement]