Surely, you too have been frustrated, while single-stepping optimized programs in symbolic debuggers, by the Brownian motion in the source code, and by never being sure, when you reach a certain source line (if you can reach it at all), whether or not earlier lines have taken effect. Our frustration is about to be significantly alleviated, thanks to two new pieces of technology about to be contributed to the GNU toolchain.
Statement Frontier Notes are stable markers of source locations, e.g. the beginning of statements introduced very early in the compilation, and maintained in place relative to such pre-existing source location markers as the "variable binding" side effect debug stmts introduced in the VTA (variable tracking at assignments) project.
Such stand-alone markers, that serve as fixed points for debug information generation, while leaving the compiler free to optimize and rearrange code, have proven to be a solid foundation for improving variable location and value information. They are now taken to their next level, identifying ideal inspection points to observe the program state, as one would expect from an unoptimized execution of the source code.
By enabling statement frontier notes, GCC will use them to emit the is_stmt
column of line number tables, so that single-stepping will advance from one statement to the next, as expected, even if instructions generated from multiple lines around them ended up shuffled all about. Individual instructions will still refer back to their original source line so that it will still be possible to make sense of e.g. machine instruction stepping.
Furthermore, is_stmt
will no longer be associated with the first machine instruction in the stream generated out of a given source code line, but rather to a program location that is logically coherent with the variable binding events that are expected to have taken place before that line. You stop at the recommended breakpoint for a line, you get to observe all variables bound to the expected values, as much as they are available or computable.
Now, wouldn't that be wonderful, or at least, well, nice to have? Alas, there are some slight complications. For example, at the ideal inspection point to observe the state at a certain source program location, there may be a machine instruction associated with an unrelated line. Code movement may even leave the ideal inspection points for multiple statements at the same machine instruction!
We could emit line number tables and variable location lists encoding all this information, with multiple inspection points at the same executable address, and with variable bindings starting and ending at the same address. But that would not help debuggers make sense of it: such empty live ranges for variable bindings would be also empty of meaning because they could not in any way be related to the inspection points that indicate the expected progress of the program.
Enter location views. We have devised a way to derive view counters from line number tables, so that there can be multiple views at the same code address, and extended variable location lists, also in a backward compatible way, so that binding ranges can name individual views as starting and ending points. This enables debug information consumers to stop a program at the recommended breakpoint.
This enables debug information consumers to stop a program at the recommended breakpoint for a statement and observe the state that should be observable at the point of the source program. Even if all the instructions associated with that statement were optimized or moved away, and then to step over that statement to the next, and observe the side effects of the former. Even if the recommended breakpoint for both is at the same address: only the location view advances.
We have listed advantages for interactive debuggers, but these often have a possibility of working on non-optimized programs, and these new features are advantageous to optimized ones only. Debug information consumers that operate on optimized programs, such as monitors that inspect the internal state of optimized programs in production, will benefit significantly too: since recommended inspection points are logically sequenced with respect to variable bindings, they will be more likely to obtain the desired information, at least for inspection points derived from line number information's is_stmt
markers.
We envision such monitors may benefit from additional markers and view augmented locations, such as inspection points for the entry of an inlined function, so that the monitor can inspect a state in which all arguments are bound, rather than at any random instruction from the inlined function that may have been scheduled much earlier.
Markers for exit points, that enable the monitor to determine the value a function is about to return, while still within the scope of the return statement, are also high on the wish list. All of these will require further extensions to debug information and to the toolchain, but they are likely to be built on the foundations of statement frontier notes and location views.
This project had a long way coming: it was first published and presented at the GCC Summit 2010, but it only got a working implementation in 2017. It is implemented in GCC, GIT branch aoliva/SFN; with binutils+gdb GIT branch users/aoliva/SFN one gets more compact line number information than GCC alone can emit. Location view numbering is submitted as a proposed extension to the DWARF debug information format standard.
Systemtap and GDB can already use is_stmt
markers, so they can gain from statement frontier notes without any further effort; as for location views, we expect to debug information consumers will gain support for them in the not-too-distant future.
Whether you are new to Linux or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.
Last updated: June 15, 2023