Two Stories for "What is CHERI?"
Some readers might have heard of CHERI. Of those who’ve heard of it, most have probably got the rough idea that it’s a “more secure” CPU — but, after that, I find that most people start to get rather confused when they try and nail down “what is CHERI?”.
Having grappled with this myself, and having helped others grapple it, I think that the major reason is that the high-level story requires understanding quite a bit of detail — hardware, Operating Systems (OSs), programming languages (and some end user programs) all have to be adapted for CHERI1. Most people tend to focus at first solely on the CPU, but one can’t form a coherent story from the CPU alone. Those who persist and piece together the other parts of CHERI can then get swept up in the novelty of the idea, and assume that CHERI magically secures everything it touches. However, CHERI is just a tool and, like all tools, it’s better at some things than others — and, even then, only when used thoughtfully.
In this blog post I’m going to explain just enough of CHERI to put together two coherent stories for what it is and where it might be used. I’m not going to try and cover all the possibilities, since, as you’ll see, this post is long enough as it is! One reason for that is that CHERI contains overlapping concepts that can be combined in more than one way. Because of that, I’ll warn you in advance that there is a fairly hard switch between the two stories in this post — sorry in advance! I’m also not going to be afraid to diverge from CHERI’s terminology, which has a number of historical quirks that I find add to the confusion.
The rough idea
Let’s start with a quote from the main CHERI site:
CHERI extends conventional hardware Instruction-Set Architectures (ISAs) with new architectural features to enable fine-grained memory protection and highly scalable software compartmentalization. The CHERI memory-protection features allow historically memory-unsafe programming languages such as C and C++ to be adapted to provide strong, compatible, and efficient protection against many currently widely exploited vulnerabilities.
In simpler terms, a CHERI CPU is one that supports extra security features, most notably “capabilities”2 – pointers with extra information – that can help catch flaws such as buffer overruns before they become security problems. Programs written in languages such as C can be recompiled, mostly with few source-level changes, so that they can make use of capabilities when run.
Let’s imagine I have this seemingly innocuous C code:
char *buf = malloc(4);
strcpy(buf, "abcd");
printf("%s\n", buf);
At first glance, you could be forgiven for thinking that strcpy
copies 4 bytes into
buf
. However, strcpy
adds a NULL byte to the end of
the string it copies, so 5 bytes are written into buf
! This is a
classic buffer overrun, of the sort that has led to a great many security flaws
in real software.
When compiled for a normal CPU and OS, the
chances are that this snippet will appear to run correctly, though it may
corrupt the program’s internal state and cause security problems. If I
compile it with a CHERI C compiler,
for a CHERI CPU, and run the resulting executable on a CHERI-aware OS, the
incorrect write of the fifth byte will be detected by the CPU, and the program
terminated with a SIGPROT
(the capability equivalent of a
SEGFAULT
). We can see that happening in the video below, where
in the split tmux
we have my normal OpenBSD desktop running
in the top and an Arm CHERI machine with a CHERI-aware OS in the bottom:
The video shows OpenBSD statically warning me that strcpy
is
dangerous which is clearly true — my buggy program ran to completion on what is
often considered the most security conscious Unix around! But on the bottom, with a
CHERI-aware OS running on CHERI hardware, the buffer overrun was detected and
the program stopped before it could become a security issue. Running gdb
shows that the buffer overrun (“Capability bounds fault”) is detected while
strcpy
is running.
I’ve been deliberately explicit that there are three “CHERI” factors
involved in this: a variant of C (“CHERI C”) that knows that CHERI
capabilities are pointers that carry extra information about the
size of a block, and a compiler capable of dealing with this; an operating
system (as in the video above that currently typically means CheriBSD, a FreeBSD variant, although some
CHERI machines now have a Linux available)
whose malloc
3 returns capabilities that record the size of the block it
has allocated; and a CHERI CPU which detects the write past the end of the
block.
We thus immediately see an obvious point of confusion: when we say “CHERI” do we mean the language, OS, CPU, an abstract definition of what CHERI is capable of, a model of capabilities, or some combination of all of these factors? In informal conversation, I’ve heard all of these possibilities occur — frequently, the two speakers don’t even realise that they’re using the term to refer to different things! I don’t think it’s possible to nail down a single meaning for “CHERI”. Instead, I use the unqualified term “CHERI” to refer to the very high-level idea and concepts and then, when talking about a specific component, to qualify it accordingly (e.g. “CHERI CPU” or “CHERI OS” and so on).
CHERI CPUs
You might have noticed above that I said “a” CHERI CPU, not “the” CHERI CPU. We thus see another point of confusion: CHERI features can be added to any CPU. Indeed, there are at least three different CHERI CPUs available. Chronologically, the first was based on a MIPS processor, though I believe that’s no longer supported. Arm’s experimental Morello CPU is a mostly normal, fairly modern, AArch64 ARMv8-A processor, extended with CHERI features (they’re not available commercially, but are/were available to researchers such as myself). There are also at least two RISC-V CHERI processors ( desktop and microprocessor) CHERI processors under design: I’m not sure if physical versions are available yet, but there there is at least simulator/emulator support.
A CHERI CPU can be thought of as a combination of the traditional things that make up a CPU, plus the things necessary to support the capability features that CHERI defines. For example, Morello supports the normal AArch64 instruction set “A64” as well as a second “pure capability” instruction set “C64”: programs can switch between the two on-the-fly (though this is normally not exposed to the high-level programmer). This might sound sci-fi, but it’s not unprecedented: for example, some CPUs can switch between different instruction sets on-the-fly4.
Practically speaking, even when taking advantage of CHERI-related features that the CPU offers, one can write “CHERI software” that is largely ignorant of the specific CHERI CPU it’s running on. There are of course some visible differences between the different CHERI CPUs, but it’s rather like the situation with normal CPUs, where I can write code that’s largely ignorant of whether it’s running on, say, AArch64 or x86_64.
Capabilities
It’s now time to define a capability. Semi-formally, I think of a capability as a token that a program can pass to the CPU to perform actions on its behalf. Unfortunately, I find that that definition only really makes sense when you already know what a capability is! To try and break that cycle, I’ll explain things using a series of gradually less inaccurate approximations.
For our first approximation, let’s say that a capability is a pointer that
comes with permissions. Amongst a capability’s permissions are
traditional things like “can this pointer read and/or write to the address
pointed to?” If I try and use a capability in a situation for which it doesn’t
have sufficient permissions, my program will be terminated with a SIGPROT
.
New capabilities
can be derived from existing capabilities. The simplest derivation
is to simply copy the capability unchanged. I can also derive a new capability that
has fewer permissions: for example, I can take a capability which can read
and write to an address and derive a new capability which can read but not
write to the address. However, I cannot add to a capability’s permissions
— this property is crucial to CHERI’s security. There are some interesting
implications to this. First, if a program destroys (e.g. by writing NULL
)
the only copy it has of a powerful permission, it can never recover those permissions.
Second, a CHERI system must start with a “root” or “super” capability that has
the maximum set of permissions that will ever be needed.
To understand how one can be prevented from extending a capability’s permissions, we need a slightly better approximation. On a normal CPU, pointers are interchangeable with integers representing memory addresses5 — I can magic an integer out of thin air and load a value from the address in memory that matches that integer. On a CHERI CPU, capabilities are not interchangeable with integers / addresses. I cannot magic a new capability out of thin air — only the CPU can create new capabilities, and it will only let me do so by deriving them from existing capabilities. This allows the CPU to enforce the rules about how an (authentic) capability can be derived from an (authentic) input capability.
We can improve our approximation further by realising that every location in
memory that can store a capability has an authenticity bit6 associated with it7. If a
program asks the CPU to
perform a capability-related operation on an inauthentic capability, it will be terminated
with a SIGPROT
. It might seem odd that one can have inauthentic capabilities,
but amongst their uses are that they are needed to represent arbitrary binary data (e.g. a JPEG
file in memory). I can always derive an inauthentic capability from
an authentic capability, but never vice versa.
We can now answer the seemingly simple question: how big is a capability? Let’s first refine our approximation by saying that a capability contains a memory address and a set of permissions. The address is the same size as normal — on a 64-bit CPU like Morello or CHERI RISC-V, that’s 64 bits. On those CPUs the set of permissions are also 64 bits in size8. Depending on how you look at it, the size of a capability can be thought of as 128 bits (64 and 64, after all!) or 129 bits if you include the authenticity bit — however, since the authenticity bit can only be read separately from the other bits, we typically refer to a capability as 128 bits in size.
Putting a CHERI story together
At this point you can be forgiven a degree of confusion: I’ve given you lots of low-level detail, but no high-level story. Fortunately, with one more bit of detail, we can start to see a CHERI story come together.
Earlier I used read/write as an example of capability permissions. However,
by far the most commonly used permissions are bounds, which record
the range of memory that a capability can read/write from. We
saw earlier an example of a capability-aware malloc
returning a
capability that encoded how big the block was. For example, malloc(4)
might return
a capability with an address of (say) 0x200, a lower bound address of 0x200,
and an upper bound address of 0x2049. If
I create a capability with an address outside of the bounds 0x200-0x20410 and try
to read/write from it, my program will terminate with a SIGPROT
.
Not coincidentally, capability bounds work very well with pointer arithmetic as found in C. Consider the following snippet:
char *buf = malloc(4);
char *c1 = buf + 1; // In-bounds
char *c2 = buf + 2; // In-bounds
char *c3 = c1 + 1; // In-bounds (and equivalent to c2)
char *c4 = buf - 1; // Out-of-bounds
char *c5 = buf + 5; // Out-of-bounds
I can read/write to any of buf
, c1
, c2
,
or c3
without issue: but trying to read or write from
c4
or c5
will terminate my program with a
SIGPROT
.
We now have enough detail to see a high-level CHERI story come together for the first time:
- Programming language: CHERI C treats what are normally pointers as capabilities, but does so in a way that allows the vast majority of normal C to be compiled unchanged with CHERI C.
- Operating system:
malloc
returns capabilities with appropriately restricted bounds. - CPU: programs that do things that are potential or actual security problems
(which includes, but is not limited to, code with undefined behaviour) tend to
be terminated by a
SIGPROT
.
Put another way: these three parts of CHERI are a pragmatic way of reducing the
security flaws that result from us using 1940s (“von Neumann”) architectures,
and 1970s languages (C) and operating systems (Unix). For example, software
running on CHERI would not have been subject to the classic Heartbleed security bug,
since it relied on a buffer overread that would have caused the program to be
terminated by a SIGPROT
— that might have caused a
denial-of-service, but that’s a lot better than giving an attacker access to
your system.
What does CHERI protect us from?
Because CHERI can transparently protect us from many classic security flaws in C programs, it can end up seeming like magic. Indeed, I haven’t even talked about the other protections we gain almost for free, such as preventing a program from executing code at arbitrary addresses. Part of the reason that CHERI can do so much for C (or, indeed, C++) programs is because C/C++ make little effort to protect the programmer from such mistakes, even though the necessary information to do so is already present in the source code. However, as soon as we want to enforce security properties that are not directly inferable from source code, we realise that we have to use CHERI explicitly rather than implicitly.
For example, I’ve assumed in this post that
malloc
returns a capability whose bounds are restricted to the
size of the block I requested. This seemingly obvious property cannot, in
the innards of any realistic malloc
, be inferred from the source code:
the malloc
author
will have to explicitly use the C CHERI API (e.g. cheri_bounds_set
)
to ensure that the bounds are restricted to the appropriate memory range.
In some cases, such as malloc
, the bounds-related security properties we
want are fairly obvious and only require small tweaks to small sections of code11. However, many security properties are either
harder to define and/or require many sections of code coordinating correctly
in order to enforce them.
For example, my program may have a secret that I want to hide away from most of the program so that it can’t accidentally leak to the outside world. Capabilities seem perfect for this, because one of the fundamental properties of a capability system is that it restricts the reachable set of capabilities12 at a given point in time (unlike a normal system where I can access nearly any part of virtual memory at any point). Indeed, CHERI gives me a number of exotic features such as sentries that allow me to enforce much more complex security properties.
While capabilities clearly have potential to make programs more secure in such cases, I’m sceptical that it will be something people want to do a great deal of. Capabilities are a dynamic (i.e. run-time) concept and humans are notoriously bad at reasoning about the dynamic (or “temporal”) behaviour of programs. My guess is that programmers will often make mistakes and give overly-powerful capabilities to parts of a program that should not have access to them, undermining the expected security guarantees.
In a sense, the problem is more general: the more complex my use of
capabilities, the less confident I am that they will enforce the security
properties I care about. In most cases, capabilities
are used to stop bad things happening: a program being terminated by a
SIGPROT
generally indicates a bug, which will then surely be
fixed. That means that I’m left hoping that capabilities will catch the bad
cases that I haven’t thought of. History suggests that if a programmer hasn’t
thought of, or hasn’t tested, something then it will not work as the programmer
would hope.
A related question is: what happens if I’m using a language that offers me more
protection than C/C++? For example, if I’m writing code in Python, or
(non- unsafe
) Rust, I can’t be subject to buffer overruns13. Are bounds checks useful in such cases? As belt and
braces, yes, but capabilities are not cost-free — they take up more
memory than traditional pointers (e.g. placing more pressure on caches) and
checking their validity must surely have some sort of performance impact on the
CPU too. I might be willing to pay this cost on the small parts of a Rust
program that use unsafe
, but probably not elsewhere.
In summary, C/C++ programs have enough to gain from using capabilities everywhere that many users might choose to do so. However, for other languages, the situation is more complex: bounds checks are much less useful, but still impose a performance cost that people are probably unwilling to pay; and more complex uses of capabilities are hard to reason about. Interestingly, though, CHERI has another trick up its sleeve that I think is of much wider interest than is currently thought.
The hybrid CHERI story
Up until this point in this post I have explained CHERI as if it requires explicitly using 128-bit capabilities everywhere. CHERI has a second “mode” of operations where one can use both normal 64-bit pointers and 128-bit capabilities alongside each other. Conventionally, when a program uses capabilities everywhere it’s said to be a purecap program; when it mixes capabilities and normal pointers it’s said to be a hybrid program.
To many people’s surprise, many current practical CHERI systems use hybrid mode in, at least, their OS kernels, because converting a large OS kernel to purecap is often too much work to contemplate. When the kernel interacts with a purecap user program, it must bridge between the hybrid and purecap worlds14.
What’s interesting about hybrid mode is that 64-bit pointer accesses are implicitly made against global 128-bit capabilities. The global capability I’ll use as an example is the Default Data Capability (DDC). One way of thinking about how this works is as follows: if I’m on Morello, executing normal A64 (not C64!) code, and then execute a load with a 64-bit pointer, the CPU will derive a new capability from the DDC with the address of the pointer and then do all its normal capability checks.
Showing how this works is a bit awkward, but I hope the following slightly simplified code helps:
// Get two normal 64-bit pointers to chunks of memory
char *b1 = malloc(4);
char *b2 = malloc(4);
// Restrict the (128-bit) DDC to only cover b1's block of memory
void *__capability new_ddc = cheri_ddc_get();
new_ddc = cheri_address_set(new_ddc, b1);
new_ddc = cheri_bounds_set(new_ddc, 4);
cheri_ddc_set(new_ddc);
printf("%c\n", b1[0]); // Succeeds
printf("%c\n", b2[0]); // SIGPROTs
This chunk is written in “hybrid” CHERI C, which is most easily thought of as
“normal” C with 64-bit pointers, with an additional __capability
qualifier which denotes a 128-bit capability. We allocate two normal
blocks of memory ( b1
and b2
), receiving back two
normal 64-bit pointers. We then read the DDC (which, by default, has bounds
from 0 to the maximum possible virtual memory address) into a capability,
restrict that capability to only cover b1
, and then set the
DDC to that new capability15. We can then read
and write via normal 64-bit pointers to b1
successfully
but as soon as we try and read to b2
our program is terminated
with a SIGPROT
16.
Hybrid mode is considered passé in the CHERI world, but I see it as potentially interesting because by restricting the DDC, we implicitly define a sort-of “subprocess” within a process. Unix-esque processes offer simple, highly effective isolation but they are very heavyweight. Various sandboxing techniques have been developed over the years, but most are awkward to use and, often, easily bypassed. Hybrid “subprocesses” seem to me to be a new point in the design space.
Splitting programs up into multiple processes for security isn’t a new idea. A handful of existing programs, most notably OpenSSH split, already do so (“ privilege separation”): an attacker who takes over an individual process has a hard time exploiting the system beyond that process. However, intra-process communication is either comically slow (pipes) or difficult to use reliably (shared memory). CHERI’s hybrid mode can still use capabilities, and those are not checked relative to the DDC: thus capabilities can be used to communicate between compartments with very little performance penalty.
Of course, hybrid “subprocesses” are not magic — they clearly provide less isolation than processes, and CHERI’s hybrid support has received much less attention than purecap. Still, at the very least hybrid mode offers a very interesting alternative to a purecap world. I think it’s fairly easy to imagine a future where software uses both processes and subprocesses to gain different security/performance trade-offs.
Summing up
There is a lot more I could have said: CHERI contains many, many details I haven’t even hinted at; and there are certainly other ways of using CHERI than the two stories I’ve outlined. I think, though, that this post is long enough already!
In terms of the two CHERI stories I’ve presented, it’s worth being explicit that the purecap story is very much the standard CHERI story at the moment, with the hybrid story being more speculative. The very different implications each story has for today and tomorrow’s software might give you some idea that CHERI is best thought of as a toolbox – and many tools within that toolbox have more than one possible use. Exactly which use(s) of CHERI will end up being most useful – which, to my mind, probably means “provides the best balance between security and performance” – remains unclear. And, of course, there are no guarantees that CHERI will enter the mainstream! But, whatever happens to CHERI, I think it is extremely helpful in opening our eyes to possibilities that most of us have previously not even considered.
Update (2023-07-05): I originally stated that all CHERI OS kernels were hybrid. I have been reliably informed that is no longer the case — there is a CheriBSD purecap kernel. CHERI Linux, though, is a hybrid-only kernel.
Acknowledgements: thanks to Jacob Bramley, Andrei Lascu, and Myoung Jin Nam for comments.
2023-07-05 09:15
If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates: