Introduction
Compiled files, often called binaries, are a mainstay of modern computer systems. But it is often hard for system builders and users to find out more than just very basic information about these files. The Annobin project exists as means to answer questions like:
- How was this binary built?
- What testing was performed on the binary?
- What sources were used to make the binary ?
The Annobin project is an implementation of the Watermark specification , which details how to record extra information in a binary. One important feature of this specification is that it includes an address range for the information stored. This makes it possible to record the fact that part of a binary was compiled with one set of options and another part was recorded with a different set of options.
How It Works
The information is stored in a binary as a series of ELF Notes, and held in a special section. ELF Notes were chosen because they are a well-defined structure, recognizable to any tool that manipulates ELF files, and because they do not get stripped out of files when debug information is removed. The section containing the notes is also marked as not being loadable, so it does not take up any space in the run-time image of the program.
The Watermark specification is designed so that when binary files are linked together the notes can just be concatenated, and they will remain valid. The specification also includes a set of rules for merging the notes, in order to reduce their size, if that becomes an issue for the user.
The notes can be generated by anything, although in the case of the Annobin project they are created by a plugin to GNU Compiler Collection (GCC). The plugin records most of the notes when it starts up by scanning the GCC command line and the compilation state. But it also inserts itself into the compilation process, so that it can monitor changes to how individual functions are compiled, and if relevant, it can record those changes too.
To extract the notes from a compiled binary the readelf program is used. This decodes the information and displays it in a human readable form.
How To Use It
To enable the Annobin plugin, use the GCC command line option: -fplugin=annobin
If GCC cannot find the plugin, then it may be necessary to add the -iplugindir option as well: -iplugindir=<path/to/dir/containing/annobin>
Note: for Fedora package maintainers - the Annobin plugin is enabled automatically if you are using the standard rpm build macros.
This should be all that is necessary to start recording information in a binary. In order to see if the plugin is working, the readelf program can be used to examine the notes:
readelf --notes --wide <file>
Most binary files already contain other types of notes, so in order to find the ones created by Annobin, look for ones whose "Owner" field starts with the letters "GA":
Owner Data size Description
GA$<version>3p4 0x00000010 OPEN Applies to region from 0x7da to 0x838
GA$<tool>gcc 7.2.1 20170915 0x00000000 OPEN Applies to region from 0x7da to 0x838
Older versions of readelf have trouble understanding the notes, so the output might look like this:
Owner Data size Description
GA$3p4 0x00000010 Unknown note type: (0x00000100)
GA$gcc 7.2.1 20170915 0x00000000 Unknown note type: (0x00000100)
The Annobin project includes some example scripts that demonstrate how these notes might be used to perform various checks. The scripts are documented in the Annobin's info file, and inside the scripts themselves. Here is a quick overview:
built-by.sh
Tries to determine which tool compiled the binary. Uses the notes if possible, but tries several
other methods as well.
check-abi.sh
Checks the binary to see if it has been built with object files that have different ABIs (and hence might not be compatible).
hardened.sh
Checks the binary to see if it has been built with the expected set of hardening options.
These are just examples. Other scripts can be written and other notes can be recorded in binaries.
How to Build It
The sources are available in compressed tarball form from here: https://nickc.fedorapeople.org/annobin-X.X.tar.xz
Where X.X is the latest version number (currently 3.4).
Alternatively, the very latest sources can be found in the Annobin git repository git://sourceware.org/git/annobin.git. Annobin also exists as a pre-built rpm in the Fedora distribution (from Fedora 27 onwards) and can be installed with the command: dnf install annobin
The sources are divided up into several sub-directories:
- plugin - The sources for the GCC plugin.
- scripts - The example scripts.
- tests - A testsuite for the plugin and scripts.
- docs - Documentation.
- config - Files necessary to configure the sources.
Only the plugin actually needs to be built, and the usual "configure; make" sequence should suffice. The plugin has several dependencies though, although the only special one is that it needs to be built by a version of GCC that supports plugins and provides the header files that they need.
How To Extend It
The Watermark specification is designed to be extensible. Arbitrary notes can be added by anyone, or by any tool. They can be added at the time the binary is created, or at a later date. The easiest way is to create an assembler file with the note(s) to be added, and then assemble it to an object file. The file can then be included in the final link of the binary, or added to it by using the objcopy program (with its --merge-notes option).
A note in the assembler source file might look something like this:
.section .gnu.build.attributes
.dc.l .Lname_end - .Lname_start # length of name field
.dc.l 0 # length of description field
.dc.l 0x100 # type = OPEN
.Lname_start:
.asciz "GA$<your-text-here>" # name field
.Lname_end:
Or, if the note needs to cover a specific address range:
.section .gnu.build.attributes
.dc.l .Lname_end - .Lname_start # length of name field
.dc.l 16 # length of description field
.dc.l 0x100 # type = OPEN
.Lname_start:
.asciz "GA$<your-text-here>" # name field
.Lname_end:
.quad start_symbol # description field
.quad end_symbol
Future Steps
The Annobin project is still in development. Future plans include:
- Adding the ability for the assembler to insert annotation notes of its own. This will allow notes to be recorded for files that are not compiled by GCC (for example, assembler source files or files compiled with LLVM).
- Adding the ability to record source code hashes. During compilation each input file (header and source code) is hashed (using SHA-256?) and its name and hash value are stored in the compiled binary. A consumer can then use the stored hash values to verify that the source code they have is the same source code that was used to compile the binary.
Links:
- Watermark specification - https://fedoraproject.org/wiki/Toolchain/Watermark
- Annobin git repository: git://sourceware.org/git/annobin.git