I work at Red Hat on GCC, the GNU Compiler Collection, and I spent most of the past year making GCC easier to use. Let's look at C and C++ improvements that will be in the next major release of GCC, GCC 9.
A new look for diagnostics
By way of example, let's look at how GCC 8 reports an attempt to use a missing binary "+" in C++:
$ gcc-8 t.cc t.cc: In function ‘int test(const shape&, const shape&)’: t.cc:15:4: error: no match for ‘operator+’ (operand types are ‘boxed_value<double>’ and ‘boxed_value<double>’) return (width(s1) * height(s1) ~~~~~~~~~~~~~~~~~~~~~~ + width(s2) * height(s2)); ^~~~~~~~~~~~~~~~~~~~~~~~
Here's what it looks like in GCC 9:
$ gcc-9 t.cc t.cc: In function ‘int test(const shape&, const shape&)’: t.cc:15:4: error: no match for ‘operator+’ (operand types are ‘boxed_value<double>’ and ‘boxed_value<double>’) 14 | return (width(s1) * height(s1) | ~~~~~~~~~~~~~~~~~~~~~~ | | | boxed_value<[...]> 15 | + width(s2) * height(s2)); | ^ ~~~~~~~~~~~~~~~~~~~~~~ | | | boxed_value<[...]>
There are a few changes here. I've added a left-hand margin, showing line numbers. The "error" line mentions line 15, but the expression in question spans multiple lines, and we're actually starting with line 14. I think it's worth a little extra horizontal space to make it clear which line is which. It also helps distinguish your source code from the annotations that GCC emits. I believe they also make it a little easier to see where each diagnostic starts, by visually breaking things up at the leftmost column.
Speaking of annotations, this example shows another new GCC 9 feature: diagnostics can label regions of the source code to show pertinent information. Here, what's most important are the types of the left-hand and right-hand sides of the "+" operator, so GCC highlights them inline. Notice how the diagnostic also uses color to distinguish the two operands from each other and the operator.
The left margin affects how we print things like fix-it hints for missing header files:
$ gcc-9 -xc++ -c incomplete.c incomplete.c:1:6: error: ‘string’ in namespace ‘std’ does not name a type 1 | std::string test(void) | ^~~~~~ incomplete.c:1:1: note: ‘std::string’ is defined in header ‘<string>’; did you forget to ‘#include <string>’? +++ |+#include <string> 1 | std::string test(void)
I've turned on these changes by default; they can be disabled via -fno-diagnostics-show-line-numbers and -fno-diagnostics-show-labels, respectively.
Another example can be seen in the type-mismatch error from the article I wrote last year, Usability improvements in GCC 8:
extern int callee(int one, const char *two, float three); int caller(int first, int second, float third) { return callee(first, second, third); }
where the bogus type of the expression is now highlighted inline:
$ gcc-9 -c param-type-mismatch.c param-type-mismatch.c: In function ‘caller’: param-type-mismatch.c:5:24: warning: passing argument 2 of ‘callee’ makes pointer from integer without a cast [-Wint-conversion] 5 | return callee(first, second, third); | ^~~~~~ | | | int param-type-mismatch.c:1:40: note: expected ‘const char *’ but argument is of type ‘int’ 1 | extern int callee(int one, const char *two, float three); | ~~~~~~~~~~~~^~~
Yet another example can be seen in this bad printf
call:
$ g++-9 -c bad-printf.cc -Wall bad-printf.cc: In function ‘void print_field(const char*, float, long int, long int)’: bad-printf.cc:6:17: warning: field width specifier ‘*’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=] 6 | printf ("%s: %*ld ", fieldname, column - width, value); | ~^~~ ~~~~~~~~~~~~~~ | | | | int long int bad-printf.cc:6:19: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘double’ [-Wformat=] 6 | printf ("%s: %*ld ", fieldname, column - width, value); | ~~~^ ~~~~~ | | | | long int double | %*f
which contrasts "inline" the type expected by the format string versus what was passed in. (Embarrassingly, we didn't properly highlight format string locations in older versions of the C++ front end; for GCC 9, I've implemented this so it has parity with that of the C front end, as shown here).
Not just for humans
One concern I've heard when changing how GCC prints diagnostics is that it might break someone's script for parsing GCC output. I don't think these changes will do that: most such scripts are set up to parse the
"FILENAME:LINE:COL: error: MESSAGE"
lines and ignore the rest, and I'm not touching that part of the output.
But it made me think it was about time we had a machine-readable output format for diagnostics, so for GCC 9, I've added a JSON output format: -fdiagnostics-format=json.
Consider this warning:
$ gcc-9 -c cve-2014-1266.c -Wall cve-2014-1266.c: In function ‘SSLVerifySignedServerKeyExchange’: cve-2014-1266.c:629:2: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation] 629 | if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) | ^~ cve-2014-1266.c:631:3: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’ 631 | goto fail; | ^~~~
With -fdiagnostics-format=json
, the diagnostics are emitted as a big blob of JSON to stderr. Running them through the handy python -m json.tool
to format them gives an idea of the structure:
$ (gcc-9 -c cve-2014-1266.c -Wall -fdiagnostics-format=json 2>&1) | python -m json.tool | pygmentize -l json [ { "children": [ { "kind": "note", "locations": [ { "caret": { "column": 3, "file": "cve-2014-1266.c", "line": 631 }, "finish": { "column": 6, "file": "cve-2014-1266.c", "line": 631 } } ], "message": "...this statement, but the latter is misleadingly indented as if it were guarded by the \u2018if\u2019" } ], "kind": "warning", "locations": [ { "caret": { "column": 2, "file": "cve-2014-1266.c", "line": 629 }, "finish": { "column": 3, "file": "cve-2014-1266.c", "line": 629 } } ], "message": "this \u2018if\u2019 clause does not guard...", "option": "-Wmisleading-indentation" } ]
In particular, the supplementary "note" is nested within the "warning" at the JSON level, allowing, for example, IDEs to group them. Some of our C++ diagnostics can have numerous child diagnostics giving additional detail, so being able to group them, for example, via a disclosure widget, could be helpful.
Simpler C++ errors
C++ is a complicated language. For example, the rules for figuring out which C++ function is to be invoked at a call site are non-trivial.
The compiler could need to consider several functions at a given call site, reject all of them for different reasons, and g++
's error messages have to cope with this generality, explaining why each was rejected.
This generality can make simple cases harder to read than they could be, so for GCC 9, I've added special-casing to simplify some g++
errors for common cases where there's just one candidate function.
For example, GCC 8 could emit this:
$ g++-8 param-type-mismatch.cc param-type-mismatch.cc: In function ‘int test(int, const char*, float)’: param-type-mismatch.cc:8:45: error: no matching function for call to ‘foo::member_1(int&, const char*&, float&)’ return foo::member_1 (first, second, third); ^ param-type-mismatch.cc:3:14: note: candidate: ‘static int foo::member_1(int, const char**, float)’ static int member_1 (int one, const char **two, float three); ^~~~~~~~ param-type-mismatch.cc:3:14: note: no known conversion for argument 2 from ‘const char*’ to ‘const char**’
For GCC 9, I've special-cased this, giving a more direct error message, which highlights both the problematic argument and the parameter that it can't be converted to:
$ g++-9 param-type-mismatch.cc param-type-mismatch.cc: In function ‘int test(int, const char*, float)’: param-type-mismatch.cc:8:32: error: cannot convert ‘const char*’ to ‘const char**’ 8 | return foo::member_1 (first, second, third); | ^~~~~~ | | | const char* param-type-mismatch.cc:3:46: note: initializing argument 2 of ‘static int foo::member_1(int, const char**, float)’ 3 | static int member_1 (int one, const char **two, float three); | ~~~~~~~~~~~~~^~~
Similarly, GCC 8 took two messages to offer suggestions for various kinds of misspelled names:
$ g++-8 typo.cc typo.cc:5:13: error: ‘BUFSIZE’ was not declared in this scope uint8_t buf[BUFSIZE]; ^~~~~~~ typo.cc:5:13: note: suggested alternative: ‘BUF_SIZE’ uint8_t buf[BUFSIZE]; ^~~~~~~ BUF_SIZE
so for GCC 9, I've consolidated the messages:
$ g++-9 typo.cc typo.cc:5:13: error: ‘BUFSIZE’ was not declared in this scope; did you mean ‘BUF_SIZE’? 5 | uint8_t buf[BUFSIZE]; | ^~~~~~~ | BUF_SIZE
In some cases, where GCC 8 knew to offer suggestions within namespaces:
$ g++-8 typo-2.cc typo-2.cc: In function ‘void mesh_to_strip()’: typo-2.cc:8:3: error: ‘tri_strip’ was not declared in this scope tri_strip result; ^~~~~~~~~ typo-2.cc:8:3: note: suggested alternative: typo-2.cc:2:9: note: ‘engine::tri_strip’ class tri_strip { ^~~~~~~~~
GCC 9 can now offer fix-it hints:
$ g++-9 typo-2.cc typo-2.cc: In function ‘void mesh_to_strip()’: typo-2.cc:8:3: error: ‘tri_strip’ was not declared in this scope; did you mean ‘engine::tri_strip’? 8 | tri_strip result; | ^~~~~~~~~ | engine::tri_strip typo-2.cc:2:9: note: ‘engine::tri_strip’ declared here 2 | class tri_strip { | ^~~~~~~~~
Location, location, location
A long-standing issue within GCC's internal representation is that not every node within the syntax tree has a source location.
For GCC 8, I added a way to ensure that every argument at a C++ call site has a source location.
For GCC 9, I've extended this work so that many more places in the C++ syntax tree now retain location information for longer.
This really helps when tracking down bad initializations. GCC 8 and earlier might unhelpfully emit errors on the final closing parenthesis or brace, for example:
$ g++-8 bad-inits.cc bad-inits.cc:12:1: error: cannot convert ‘json’ to ‘int’ in initialization }; ^ bad-inits.cc:14:47: error: initializer-string for array of chars is too long [-fpermissive] char buffers[3][5] = { "red", "green", "blue" }; ^ bad-inits.cc: In constructor ‘X::X()’: bad-inits.cc:17:35: error: invalid conversion from ‘int’ to ‘void*’ [-fpermissive] X() : one(42), two(42), three(42) ^
whereas now, GCC 9 can highlight exactly where the various problems are:
$ g++-9 bad-inits.cc bad-inits.cc:10:14: error: cannot convert ‘json’ to ‘int’ in initialization 10 | { 3, json::object }, | ~~~~~~^~~~~~ | | | json bad-inits.cc:14:31: error: initializer-string for array of chars is too long [-fpermissive] 14 | char buffers[3][5] = { "red", "green", "blue" }; | ^~~~~~~ bad-inits.cc: In constructor ‘X::X()’: bad-inits.cc:17:13: error: invalid conversion from ‘int’ to ‘void*’ [-fpermissive] 17 | X() : one(42), two(42), three(42) | ^~ | | | int
What is the optimizer doing?
GCC can automatically "vectorize" loops, reorganizing them to work on multiple iterations at once, to take advantage of the vector units on your CPU. However, it can do this only for some loops; if you stray from the path, GCC will have to use scalar code instead.
Unfortunately, historically it hasn't been easy to get a sense from GCC about the decisions it's making as it's optimizing your code. We have an option, -fopt-info, that emits optimization information, but it's been more of a tool for the developers of GCC itself, rather than something aimed at end users.
For example, consider this (contrived) example:
#define N 1024 void test (int *p, int *q) { int i; for (i = 0; i < N; i++) { p[i] = q[i]; asm volatile ("" ::: "memory"); } }
I tried compiling it with GCC 8 with -O3 -fopt-info-all-vec
, but it wasn't very enlightening:
$ gcc-8 -c v.c -O3 -fopt-info-all-vec Analyzing loop at v.c:7 v.c:7:3: note: ===== analyze_loop_nest ===== v.c:7:3: note: === vect_analyze_loop_form === v.c:7:3: note: === get_loop_niters === v.c:7:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed v.c:3:6: note: vectorized 0 loops in function. v.c:3:6: note: ===vect_slp_analyze_bb=== v.c:3:6: note: ===vect_slp_analyze_bb=== v.c:10:7: note: === vect_analyze_data_refs === v.c:10:7: note: got vectype for stmt: _5 = *_3; vector(4) int v.c:10:7: note: got vectype for stmt: *_4 = _5; vector(4) int v.c:10:7: note: === vect_analyze_data_ref_accesses === v.c:10:7: note: not consecutive access _5 = *_3; v.c:10:7: note: not consecutive access *_4 = _5; v.c:10:7: note: not vectorized: no grouped stores in basic block. v.c:7:3: note: === vect_analyze_data_refs === v.c:7:3: note: not vectorized: not enough data-refs in basic block. v.c:7:3: note: ===vect_slp_analyze_bb=== v.c:7:3: note: ===vect_slp_analyze_bb=== v.c:12:1: note: === vect_analyze_data_refs === v.c:12:1: note: not vectorized: not enough data-refs in basic block.
For GCC 9, I've reorganized problem-tracking within the vectorizer so that the output is of the form:
[LOOP-LOCATION]: couldn't vectorize this loop [PROBLEM-LOCATION]: because of [REASON]
For the example above, this gives the following, identifying the location of the construct within the loop that the vectorizer couldn't handle. (I hoped to have it also show the source code, but that didn't make feature freeze):
$ gcc-9 -c v.c -O3 -fopt-info-all-vec v.c:7:3: missed: couldn't vectorize loop v.c:10:7: missed: statement clobbers memory: __asm__ __volatile__("" : : : "memory"); v.c:3:6: note: vectorized 0 loops in function. v.c:10:7: missed: statement clobbers memory: __asm__ __volatile__("" : : : "memory");
This improves things, but still has some limitations, so for GCC 9 I've also added a new option to emit machine-readable optimization information: -fsave-optimization-record.
This writes out a SRCFILE.opt-record.json.gz
file with much richer data: for example, every message is tagged with profile information (if available), so that you can look at the "hottest" part of the code, and it captures inlining information, so that if a function has been inlined into several places, you can see how each instance of the function has been optimized.
Other improvements
GCC can emit "fix-it hints" that suggest how to fix a problem in your code. These can be automatically applied by an IDE.
For GCC 9, I've added various new fix-it hints. There are now fix-it hints for forgetting the return *this;
needed by various C++ operators:
$ g++-9 -c operator.cc operator.cc: In member function ‘boxed_ptr& boxed_ptr::operator=(const boxed_ptr&)’: operator.cc:7:3: warning: no return statement in function returning non-void [-Wreturn-type] 6 | m_ptr = other.m_ptr; +++ |+ return *this; 7 | } | ^
and for when the compiler needs a typename
:
$ g++-9 -c template.cc template.cc:3:3: error: need ‘typename’ before ‘Traits::type’ because ‘Traits’ is a dependent scope 3 | Traits::type type; | ^~~~~~ | typename
and when you try to use an accessor member as if it were a data member:
$ g++-9 -c fncall.cc fncall.cc: In function ‘void hangman(const mystring&)’: fncall.cc:12:11: error: invalid use of member function ‘int mystring::get_length() const’ (did you forget the ‘()’ ?) 12 | if (str.get_length > 0) | ~~~~^~~~~~~~~~ | ()
and for C++11's scoped enums:
$ g++-9 -c enums.cc enums.cc: In function ‘void json::test(const json::value&)’: enums.cc:12:26: error: ‘STRING’ was not declared in this scope; did you mean ‘json::kind::STRING’? 12 | if (v.get_kind () == STRING) | ^~~~~~ | json::kind::STRING enums.cc:3:44: note: ‘json::kind::STRING’ declared here 3 | enum class kind { OBJECT, ARRAY, NUMBER, STRING, TRUE, FALSE, NULL_ }; | ^~~~~~
And I added a tweak to integrate the suggestions about misspelled members with that for accessors:
$ g++-9 -c accessor-fixit.cc accessor-fixit.cc: In function ‘int test(t*)’: accessor-fixit.cc:17:15: error: ‘class t’ has no member named ‘ratio’; did you mean ‘int t::m_ratio’? (accessible via ‘int t::get_ratio() const’) 17 | return ptr->ratio; | ^~~~~ | get_ratio()
I've also tweaked the suggestions code so it considers transposed letters, so it should do a better job of figuring out misspellings.
Looking to the future
The above covers some of the changes I've made for GCC 9.
Perhaps a deeper change is that we now have a set of user experience guidelines for GCC, to try to keep a focus on the programmer's experience as we implement new diagnostics. If you'd like to get involved in GCC development, please join us on the GCC mailing list. Hacking on diagnostics is a great way to get started.
Trying it out
GCC 9 will be in Fedora 30, which should be out in a few weeks.
For simple code examples, you can play around with the new
GCC at https://godbolt.org/ (select GCC "trunk").
Have fun!
See Also
If you are using GCC 8 on Red Hat Enterprise Linux 6, 7, or 8 Beta, some articles that might be of interest:
Last updated: March 7, 2019