In C, C++, and some other programming languages, the term aliasing refers to a situation where two different expressions or symbols refer to the same object. When references access that object in different ways—as both reads and stores—there are consequences for the order in which these mixed accesses can happen. The value that is stored first is expected to be read by the subsequent access. In many instances, aliasing is harmless: It is common, safe, and usually optimally efficient to use two pointers of the same type to read, and even to write to the same object. But in some cases, using aliasing symbols for mixed accesses is less benign, and can adversely affect the correctness or efficiency of your code.
Although there are quite a few articles on this subject, most tend to focus on the rules and requirements outlined in the standards (such as the strict aliasing rule). In this article, I focus on the details of the C and C++ language restrictions, their challenges and pitfalls, and examples demonstrating the restrictions' beneficial effects in optimized code. In Part 2, I will present exemptions from aliasing, which can help you get around the restrictions more or less safely. I also consider some of the common pitfalls of aliasing and mixed accesses, and the actual problems these pitfalls might cause.
About the examples
I use many small coding examples to illustrate my main points. The code is intentionally simplistic and not meant to be useful beyond illustrating the material. In addition, while optimization is the article's main topic, in most cases, it would be of negligible value to optimize the code examples unless you were executing many of them in close succession. That situation is often the case when such code appears in loops—in particular (but not exclusively), loops that might lend themselves to vectorization. I leave extending the examples to their more realistic "loopy" equivalents as an exercise for interested readers. Additionally, throughout the article, I include links to my examples on Compiler Explorer so that you can see how some popular compilers handle them. I encourage you to use that site to experiment with the examples. I'd also like to take this opportunity to thank the site's authors for a remarkably helpful tool and service.
Some of the examples illustrate the effects of invalid code on optimization results. It's important to emphasize that invalid code results in undefined behavior for the entire program. The effects seen in small examples can and very likely will be different in more realistic programs. In more complex scenarios, these effects could be masked by the compiler's limited ability to follow the control flow or track the data flow through a program, translation unit, or function. Effects might also depend on the constant values or ranges of values of operands used in the code, all of which typically depends on factors such as optimization options, the target architecture, and the specific compiler and its version.
Objects and types in C and C++
Object models and type systems capture some of the most fundamental principles of the C and C++ languages. Both languages rely on object identity: The fact that each named object is uniquely designated by its name, and that no other identifier denotes the same object. Because all objects must be distinct from each other, and because null is a valid pointer that doesn't point to any object, no two object addresses can be equal to one another, or to null. (The notable exception is the equality of the address of the first sub-object to that of its enclosing object. It is also possible that a pointer just past the end of one object will compare equal to one to an unrelated object that happens to be stored at that location, but that is happenstance and never guaranteed.)
Access by compatible types
Whether or not they have a name, all objects have a type, and their value can only be meaningfully interpreted within that type. It rarely makes sense to interpret, for example, an object of type int
as if it were a float
, or vice versa. Object values must be converted from one type to another (usually by casting) in order to be reinterpreted. But the objects themselves must only be accessed—that is, have their value stored or read—within their given type. For a declared object that has a name, the type is simply the type used to declare it. For a dynamically allocated object, the type (called the effective type) is the type of the last store to the object.
Note: The same type requirement isn't entirely accurate because it doesn't capture an important subtlety: That signed and unsigned forms of the same underlying integer types are considered the same, as are differently qualified forms of the same underlying type. But those are minor details.
The concept of size
Related to the concept of a type is the notion of size. It is only possible to access objects, including member sub-objects and array elements, within the boundaries imposed by their size. In standard C and C++, the size of every object and type is nonzero, but some compiler extensions allow exceptions to this guarantee. In most cases, the size is determined by the object's type. It is useful to call out this constraint separately, however, because it is easily violated in accesses to aggregates (meaning structures, particularly arrays).
Informally, the term aliasing in C and C++ programming most commonly refers to breaking the type rules I've just introduced. Typically, aliasing means using a type other than the type of the value stored in it. When it involves arrays, aliasing might mean using an index value that's outside the bounds of the array and accessing another object stored adjacent to the array. In some cases, aliasing can also mean using an identifier to access an apparently different object than the one it declares. Both C and C++ outline a handful of exceptions where such accesses are permitted; but, those aside, the effects of aliasing are largely not defined. Compilers rely on programs abiding aliasing restrictions to optimize away costly memory reads. Bad things can happen when a program violates those restrictions. In the next sections, I'll show many examples of what I mean by that.
Accesses to distinct objects
To start, consider the following function (also see the Accesses to distinct objects example in Compiler Explorer):
extern int a[]; extern int b[]; int f (int i, int j) { int t = a[i]; b[j] = 0; // cannot change a return a[i] - t; // can be folded to zero }
Most optimizing compilers, including GNU Compiler Collection (GCC), replace the a[i] - t
expression with the constant zero, because a
andb
designate distinct objects that cannot alias. Regardless of the array indices' values, no valid store to one can change the value of the other. Therefore, a[i]
and t
must be equal, and the result of the subtraction must be zero. The same optimization wouldn't be possible if either a
or b
were pointers to the same type, because both could point to the same object; in other words, they could alias.
There isn't much a program can do to run afoul of this optimization—unless it is improperly using one of the common compiler extensions. I'll show examples of such extensions when I discuss aliases and weak symbols in Part 2.
Accesses by incompatible types
When pointers point to distinct types, as demonstrated by the function in the next example, GCC and most other compilers will optimize the result to zero (also see Accesses via pointers to incompatible types):
int f (int *a, long *b) { int t = *a; *b = 0; // cannot change *a return *a - t; // can be folded to zero }
Because a
and b
are declared to be pointers to incompatible types, and because C and C++ require that objects have their stored value accessed only by an lvalue of a compatible type, the store to *b
cannot affect the value of *a
cached in the variable t
, which is typically a register. Therefore, the operands in the subtraction expression must be equal, and the result must be zero.
This requirement applies when accessing members of incompatible type structure, even if the member types are compatible. Thus, in the following example, the compiler can fold the return
expression to zero. It works because the structure types are distinct for the members being accessed (see Accesses to members of incompatible structs):
struct A { int i; }; struct B { int i; }; int f (struct A *a, struct B *b) { int t = a->i; b->i = 0; // cannot change a->i return a->i - t; // can be folded to zero }
It's important to keep in mind that the optimization I just showed wouldn't be valid if a
and b
pointed to the same type—unless they were declared using restrict
. (I'll talk more about the restrict
keyword soon.) It should now be obvious that calling f
against the rules with a
and b
pointing to the same object would still return zero, regardless of the object's initial value. But such an outcome would likely be surprising to the unsuspecting programmer and result in a bug. It's the main reason why we are only allowed to access objects by the lvalues of their type.
It is also instructive to see what might happen if a function like f
were called with the same pointer. Such a call would be invalid because of the requirement to access every object by an lvalue of its type. The code's behavior would necessarily be undefined, and we would need to consider any expectations of the code's behavior in that light.
It seems reasonable to expect a compiler to report a violation of this requirement upon detecting one. However, none of the popular compilers issue such a warning (see Invalid access via a pointer to an incompatible type).
Accesses to distinct array elements
As shown in the following example, a compiler can also fold the return
expression to zero (see Accesses to distinct array elements of a matrix):
int a[2][2]; int f (int i, int j) { int t = a[1][j]; a[0][i] = 0; // cannot change a[1] return a[1][j] - t; // can be folded to zero }
This time, the rationale is more subtle: It depends on the requirement that the index in every access must be within the bounds of the array at each dimension. The two arrays involved in the accesses in the function—namely a[0]
and a[1]
—are distinct elements of the larger array a
, so they must not alias one another. A store to either of the two elements of a[0]
cannot affect the value of either element of a[1]
. Consequently, the invalid call f (2, 0)
would likely return an unexpected result if a[1][0]
were initially nonzero. The -Warray-bounds warning seen in the modified example detects a subset of these invalid array references (see Overlapping access to distinct array elements of a matrix).
Accesses to member arrays
Conceptually, every ordinary non-array object can be viewed as an element of a one-element array. The only difference between the two is syntactic. In addition, because every object is represented in memory as a continuous sequence of bytes, we can consider every object—including a struct
and its members—to be such a sequence. However, except for the special case of accesses by character types (see "Aliasing by character types" in Part 2), we must confine access to an array member to the bounds of the array. It is not valid to view an array member as aliasing other members adjacent to it. As an example, consider the following function (also see Access to a member array):
struct A { char a[8], b; }; int f (struct A *p, const char *s) { int t = p->b; for (int i = 0; ; ++i) if (!(p->a[i] = s[i])) // cannot change p->b break; return p->b - t; // can be folded to zero }
In this example, the for
loop assignments to p->a[i]
may only modify elements of the p->a
array. Similar to the previous example, we are not permitted to write outside the boundaries of a sub-object and into adjacent members of the struct
. The issue is not the potential for padding between the members, but that writing outside the boundaries of the sub-object would imply that distinctly named objects would alias. Because of this restriction, the compiler can again fold the subtraction in the return
statement to zero. You will find this optimization in both GNU Compiler Collection (GCC) and Intel C++ Compiler (ICC).
Further restrictions
General restrictions in the C and C++ languages enable compilers to optimize for many use cases. However, programmers often must satisfy even stricter constraints in our code, which compilers may not be able to infer without additional annotations. In this section, we will consider annotations that are either specified by the language standards or provided as compiler extensions.
The restrict
keyword
The restrict
keyword is a type qualifier in C. Although C++ doesn't formally specify it, compilers commonly support it under the alternate syntax of __restrict
or __restrict__
. We can apply this qualifier to pointers, and also to references in C++. In either case, adding the qualifier indicates that the referred objects are not both read from and written to via distinct expressions. We can also assume that these expressions are not based on the same pointer.
The formal definition of restrict
is somewhat involved, but the basic use case is straightforward: It is used to qualify one or more function-pointer arguments to indicate that the function accesses objects that do not overlap. In C++, the use case could include the this
pointer, as well as any reference arguments. As an example, here's restrict
used in a declaration of the standard strcpy
function:
char* strcpy (char * restrict dst, const char * restrict src);
This declaration expresses the constraint that the function may not store into any element of the array that dst
points to, which it also reads from through src
. Because compilers usually "know" the effects of many standard functions, this constraint is mainly useful for user-defined functions. So, for instance, using restrict
to declare the pointer arguments would open up the optimization opportunity shown in the example of Accesses via pointers to incompatible types, even to pointers to the same type (also see Access by restricted pointers):
int f (int * restrict a, int * restrict b) { int t = *a; *b = 0; // cannot change *a return *a - t; // can be folded to zero }
Naturally, passing f
the address of the same object (or in fact, even distinct addresses that would result in overlapping accesses by the function) violates the restriction and causes an unexpected result (see Violating restricted pointers). The GCC -Wrestrict warning detects this problem when a
and b
overlap exactly, but only when the two pointers are one and the same object. Despite the warning, GCC still folds the subtraction into zero. That's because it only detects the problem at the point of the call to f
, and even that only under very limited circumstances. It does not diagnose the equivalent call in function g
.
The restrict
keyword and const
Using restrict
to declare function-pointer arguments is useful for optimizing the body of a function. We can also use this syntax to enable optimizations in function callers. Combined with the const
qualifier, a restrict
-qualified pointer expresses the following constraint: That if the object it points to is accessed by any means, it cannot be modified during the lifetime of the pointer itself.
Because f
cannot modify the object its argument points to, it is possible for a compiler to fold the return
expression in the following function to zero (see Restricted const pointer parameters):
void f (const int * restrict); int g (int *a) { int t = *a; f (a); // cannot change *a return *a - t; // can be folded to zero }
If you check the live demo for this example, you will see that the compilers I tested do not yet leverage this optimization.
Restricting const
pointer parameters wouldn't be safe in the absence of the restrict
qualifier. First, the language doesn't prevent modifying objects that are pointed to by const
-qualified pointers, unless the objects themselves are declared const
. Because of this, it is valid for a program to cast the constness of the pointer away and use the result to modify the object. Second, let's say that we tried the example above without restrict
. If function f
had a way to access the object pointed to by p
, by some other means than through p
, it could modify it even without casting the constness of its argument away. Using the restrict
qualifier, in conjunction with the read from *a
in g
, rules out both of those possibilities; it effectively guarantees that the call to f
will not modify by any means the object whose address it has passed.
In standard C, we can apply the restrict
keyword just as effectively at the function-prototype scope and at block scope. Thanks to its transitive nature, we can use restrict
to express aliasing guarantees, even without modifying the API. For instance, the following function (see Restricted local const pointer variables) is equivalent to the previous code, and can be similarly optimized:
void f (int *); int g (int *a) { const int * restrict p = a; int t = *p; f (a); // cannot change *a return *a - t; // can be folded to zero }
As in the previous example, no compiler, including GCC, uses this restriction to optimize code.
A closing thought about restrict
If f
were to modify the object that a
and p
point to, any optimization assumptions the compiler made based on the restrict
keyword (such as folding the return
expression to zero) would result in undefined behavior. In view of that, one might hope that compilers would at least detect the most obvious misuses of restrict
and issue a warning. Unfortunately, no known compiler does. As a result, restrict
is as dangerous as it is useful.
Attributes const
, pure
, and noalias
Some compilers provide extensions to specify additional aliasing restrictions. GCC and compatible compilers support the const
and pure
function attributes. We can use these attributes as an optimization hint, declaring a function that either doesn't access observable program state at all (attribute const
) or does so only by reading the objects referenced by that function's pointer arguments (attribute pure
).
Visual C/C++ and compatible compilers support using __declspec(noalias)
to declare that, similar to attribute pure
, a function doesn't access program state except by dereferencing its pointer arguments. Consider the effect of these hints in the program below (also see Restricting aliasing by function):
#if __GNUC__ # define NOALIAS __attribute__ ((pure)) #elif _MSC_VER # define NOALIAS __declspec (noalias) #else # error NOALIAS not supported #endif NOALIAS int f (int *); int g (int *a) { int t = *a; int x = 1; x = f (&x); // can be eliminated return *a - t ? x : 0; // can be folded to zero }
We can assume the call f (&x)
does not change any program state except for x
itself, so we know the expression *a - t
can be folded to zero; and therefore, so can the entire return
statement. Consequently, given the function does not use the value of x
, we can also eliminate the call to f
.
As an experiment, it is instructive to see what might happen when the noalias
violates the requirement not to change the program state. Unfortunately, despite being obvious, none of the tested compilers diagnoses this bug (see Violating function aliasing restriction). Some appear to detect and act on it, while others try to mitigate it only in some contexts. Other compilers ignore it completely. Given that undefined behavior is the consequence of violating an explicit requirement, none of this should be terribly surprising, except perhaps the lack of a diagnostic. It isn't difficult for a compiler to detect when a function modifies non-local state, contrary to the pure
or noalias
requirement. Regrettably, as of now, none does.
Conclusion to Part 1
In Part 1 of this article, I focused on the details of C and C++ language restrictions and their beneficial effects in optimized code. In Part 2, I will present a variety of exemptions found in popular compilers, that developers can use to get around aliasing restrictions more or less safely. I will also discuss some of the common pitfalls of aliasing and mixed accesses, and some of the problems these pitfalls might cause.
Last updated: February 6, 2024