Application developers continue to need newer versions of libraries, including core runtimes like GNU C Library (glibc), for their applications. In this article, I'll look at some issues related to upgrading glibc in an operating system (OS) distribution, and I also encourage you to read Florian Weimer's excellent blog post on the topic.
The problem
Deciding between a library rebase or continued backporting of commits involves a complex set of risks and rewards. For some customers and users, it is important not to rebase the library (ensuring the lowest risk of impact by change); but for others, the rebase brings valuable bug fixes (lowest risk of impact from known issues). In other cases, the newer library may perform better, even if the interfaces haven't changed, because it can take advantage of newer hardware or a newer Linux kernel (performance advantage to first mover).
There is no way to simultaneously satisfy all the requirements of slow-moving versus fast-moving development. The recent work in Fedora Modularity is aimed at solving the root of this problem, but there is a limit to this work. The further down the stack you go, the harder the problem becomes. The potential for breakage further up the stack increases. You can't always arbitrarily change a component's installed version without consequences, either at build time or at runtime.
The solution
What if the reward for the customer or user outweighed the risk? How would we deploy a rebased glibc (which is Florian's identified conceptually simplest option)? How could we mitigate some of the worst problems?
We can break down a possible solution like this:
- Delivering a new glibc.
- Build time isolation.
- Choosing to override the system glibc for the system or for an application.
- Validation and verification of a system with a new glibc.
For each step, I'll discuss how to reduce or mitigate the problems introduced by the rebase.
Delivering a new glibc
Fundamentally, the simplest solution is to rebase the entire library and accept the consequences. With modularity, we have the ability to deliver an alternate version of the library in a distinct rpm repository. Specific users can enable those repositories if they evaluate the risk vs. reward to be worthwhile. Without modularity, we can use software collections or an alternative package name (e.g., glibc-alt).
The straightforward answer is that a combination of modularity and alternative package names provides a robust solution. We have to install the entirely new glibc under a distinct path, for example, something like /opt/glibc/2.29, and symlinks to /opt/glibc/latest and a special /opt/glibc/release. That's it, we have a new glibc that doesn't impact the running system in any way.
Next, I'll talk about how to use it.
Build time isolation
What is installed in the /opt/glibc/release? The exact set of headers and libraries that are part of the distribution release all in the form of an alternative system root. This is called a "sysroot" in toolchain terminology, and you can point your compiler and linker at it with --sysroot
and that path can be used in preference to compile an application (including the use of -Wl,-rpath
and -Wl,--dynamic-linker
).
A more straightforward name for the sysroot would be "platform interface for glibc," which I discuss here in some detail. In this case, the /opt/glibc/release sysroot would be used as the default system sysroot, or we would create a symlink tree into it. The point is that you always compile against the "released" version of the libraries for the distribution.
No matter how many newer versions of the libraries you have installed in /opt/glibc, there is always a /opt/glibc/release that is the default sysroot for compiling applications. This approach mitigates one of the potential problems in a rebase, namely applications no longer compiling when an interface they depended upon is deprecated. The /opt/glibc/release sysroot will not deprecate any interfaces in the lifetime of the distribution release.
How then can we take advantage of the new glibc?
Overriding the system glibc
You can now have multiple versions of glibc installed in the distribution with the default being /opt/glibc/release. You also have two options, both of which are discussed in Florian's article, and we can realize them here.
You could compile your application specifically against /opt/glibc/2.29 and forever link against glibc 2.29. There are support implications for this approach, however, because glibc 2.29 may have limited support. Under the hood at this point, the system glibc runtimes are likely a symlink into /opt/glibc/release, which provides system glibc libraries.
What if it is valuable enough for you to change one system, or one container, to use the new glibc? If you have identified a benefit, then it is valuable to allow an alternatives-like selection scheme where one symlink can be changed to switch the system between /opt/glibc/release and /opt/glibc/latest (or some other specific version), but only for the runtime components, that is, shared library SONAMEs (not the glibc-devel symlinks which still point to /opt/glibc/release). With a quick switch of alternatives --set
, you would be running all new processes with the new runtime. The point is that the developer makes the choice.
The consequence is that rebasing glibc has validation implications for the rest of the software stack.
Validation and verification of a new system glibc
How do you quantify the risk of supporting a glibc rebase in a distribution? The only way is to actually do it and measure the failures. There is no other way short of modeling risk, but even a model without data is only estimation based on other measured failures. We have to embark on a rebase of glibc in the distribution to gain this experience, and hopefully get additional benefit. Interestingly, there is already a similar solution from the Fedora kernel.
The Fedora kernel team rebases the kernel used in all active Fedora releases following a staggered delivery approach. The same approach for specific glibc versions could allow us to gather significant experience doing a glibc rebase in the distribution. Keep in mind that we already do this in Fedora Rawhide, which keeps rolling in new glibc all the time. The point here would be to update active Fedora releases with a new stabilized version of glibc. For example, Fedora 29 with glibc 2.28 has been out for a while accumulating bug fixes and CVE fixes. Eventually, it will be beneficial to switch Fedora 28 and Fedora 27 to glibc 2.28 and provide all three active distributions with one updated and refreshed glibc, while Fedora 27 and 28 would continue to use their default /opt/glibc/release sysroot to isolate build-time changes from those packages and users.
First steps
In Fedora Rawhide, I have already added the required code to move glibc into a sysroot location. Similarly, Nick Clifton has also enabled full --sysroot
support in the Fedora binutils packages (something we didn't have enabled before). What really needs to follow is the validation and verification of a new glibc. We need to accumulate the experience of doing these rebases, including the experience of knowing what not to do. If we can't do the release rebases, then it will be difficult to support an alternative glibc. For example, we need to thoroughly test the deployment with symlinks and alternatives for switching glibc during an upgrade.
I look forward to following in the Fedora Linux kernel team's footsteps. Attempting a stabilized rebase of glibc will provide our users with faster access to bug fixes and CVE fixes. It will also give the Fedora glibc team the much-needed experience to tackle future rebases. The consequence of this experience will have long-reaching benefits.
One day, perhaps we can enable entirely new glibc on later Fedora as a developer option!
Last updated: November 1, 2023