In an earlier post, I talked about GCC (the GNU Compiler Collection), and work that I did to make the internals of GCC more robust for GCC 5.
This post is about something more user-visible: as of GCC 5, GCC's code-generation backend can now be built as a shared library.
When might you want to generate machine code? The primary reason is for speed: anything that parses an input format and repeatedly acts on it, such as language interpreter, or a regular expression engine, might benefit from doing some work up-front to translate the input (or some part of it) directly into machine code.
The new library in GCC 5 is named libgccjit, since it can be used to implement Just-In-Time compilation within a language interpreter. Although I primarily had JIT-compilation in mind when writing it, the library can also write the code it generates out to disk as an executable, allowing you to build more conventional ahead-of-time compilers.
As the author, I may be biased, but I believe libgccjit makes it relatively easy to build a compiler, so, as a challenge, let's see if we can write a compiler in one blog post...
To make it easy, let's construct a compiler for an esoteric programming language that we shall refer to as “brainf”.
Here's what a simple .b script looks like:
[
Emit the uppercase alphabet
]
cell 0 = 26
++++++++++++++++++++++++++
cell 1 = 65
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
while cell#0 != 0
[
>
. emit cell#1
+ increment cell@1
<- decrement cell@0
]
This example makes use of whitespace and comments for legibility, but could have been written as:
++++++++++++++++++++++++++
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
[>.+<-]
Clearly it's not a particularly useful language, except for providing compiler-writers with something that's easy to parse.
brainf scripts operate on an array of bytes, with a notional data pointer within the array. The operations are:
Character | Meaning |
---|---|
> | idx += 1 |
< | idx -= 1 |
+ | data[idx] += 1 |
- | data[idx] -= 1 |
. | output (data[idx]) |
, | data[idx] = input () |
[ | loop until data[idx] == 0 |
] | end of loop |
Anything else | ignored |
The libgccjit API is currently accessible in various ways:
- from a C header file (the core API).
- via a C++ wrapper API.
- via Python bindings.
- via D bindings.
To keep things as simple as possible, we'll write our compiler in Python.
We use the “gccjit” Python module to call into libgccjit, to populate a gccjit.Context, building a function named “func” in gccjit's internal representation. We then compile the gccjit.Context, either to an in-process machine code function, or to an executable on disk.
class Paren: def __init__(self, b_test, b_body, b_after): self.b_test = b_test self.b_body = b_body self.b_after = b_after class CompileError(Exception): def __init__(self, compiler, msg): self.filename = compiler.filename self.line = compiler.line self.column = compiler.column self.msg = msg def __str__(self): return ("%s:%i:%i: %s" % (self.filename, self.line, self.column, self.msg)) class Compiler: def __init__(self): self.ctxt = gccjit.Context() if 1: self.ctxt.set_int_option(gccjit.IntOption.OPTIMIZATION_LEVEL, 3); self.ctxt.set_bool_option(gccjit.BoolOption.DUMP_INITIAL_GIMPLE, 0); self.ctxt.set_bool_option(gccjit.BoolOption.DUMP_GENERATED_CODE, 0); self.ctxt.set_bool_option(gccjit.BoolOption.DEBUGINFO, 1); self.ctxt.set_bool_option(gccjit.BoolOption.DUMP_EVERYTHING, 0); self.ctxt.set_bool_option(gccjit.BoolOption.KEEP_INTERMEDIATES, 0); self.void_type = self.ctxt.get_type(gccjit.TypeKind.VOID) self.int_type = self.ctxt.get_type(gccjit.TypeKind.INT) self.byte_type = self.ctxt.get_type(gccjit.TypeKind.UNSIGNED_CHAR) self.array_type = self.ctxt.new_array_type(self.byte_type, 30000) self.func_getchar = ( self.ctxt.new_function(gccjit.FunctionKind.IMPORTED, self.int_type, b"getchar", [])) self.func_putchar = ( self.ctxt.new_function(gccjit.FunctionKind.IMPORTED, self.void_type, b"putchar", [self.ctxt.new_param(self.int_type, b"c")])) self.func = self.ctxt.new_function(gccjit.FunctionKind.EXPORTED, self.void_type, b'func', []) self.curblock = self.func.new_block(b"initial") self.int_zero = self.ctxt.zero(self.int_type) self.int_one = self.ctxt.one(self.int_type) self.byte_zero = self.ctxt.zero(self.byte_type) self.byte_one = self.ctxt.one(self.byte_type) self.data_cells = self.ctxt.new_global(gccjit.GlobalKind.INTERNAL, self.array_type, b"data_cells") self.idx = self.func.new_local(self.int_type, b"idx") self.open_parens = [] self.curblock.add_comment(b"idx = 0;") self.curblock.add_assignment(self.idx, self.int_zero) def get_current_data(self, loc): """Get 'data_cells[idx]' as an lvalue. """ return self.ctxt.new_array_access(self.data_cells, self.idx, loc) def current_data_is_zero(self, loc): """Get 'data_cells[idx] == 0' as a boolean rvalue.""" return self.ctxt.new_comparison(gccjit.Comparison.EQ, self.get_current_data(loc), self.byte_zero, loc) def compile_char(self, ch): """Compile one bf character.""" loc = self.ctxt.new_location(self.filename, self.line, self.column) # Turn this on to trace execution, by injecting putchar() # of each source char. if 0: arg = self.ctxt.new_rvalue_from_int (self.int_type, ch) call = self.ctxt.new_call (self.func_putchar, [arg], loc) self.curblock.add_eval (call, loc) if ch == '>': self.curblock.add_comment(b"'>': idx += 1;", loc) self.curblock.add_assignment_op(self.idx, gccjit.BinaryOp.PLUS, self.int_one, loc) elif ch == '<': self.curblock.add_comment(b"'<': idx -= 1;", loc) self.curblock.add_assignment_op(self.idx, gccjit.BinaryOp.MINUS, self.int_one, loc) elif ch == '+': self.curblock.add_comment(b"'+': data[idx] += 1;", loc) self.curblock.add_assignment_op(self.get_current_data (loc), gccjit.BinaryOp.PLUS, self.byte_one, loc) elif ch == '-': self.curblock.add_comment(b"'-': data[idx] -= 1;", loc) self.curblock.add_assignment_op(self.get_current_data(loc), gccjit.BinaryOp.MINUS, self.byte_one, loc) elif ch == '.': arg = self.ctxt.new_cast(self.get_current_data(loc), self.int_type, loc) call = self.ctxt.new_call(self.func_putchar, [arg], loc) self.curblock.add_comment(b"'.': putchar ((int)data[idx]);", loc) self.curblock.add_eval(call, loc) elif ch == ',': call = self.ctxt.new_call(self.func_getchar, [], loc) self.curblock.add_comment(b"',': data[idx] = (unsigned char)getchar ();", loc) self.curblock.add_assignment(self.get_current_data(loc), self.ctxt.new_cast(call, self.byte_type, loc), loc) elif ch == '[': loop_test = self.func.new_block() on_zero = self.func.new_block() on_non_zero = self.func.new_block() self.curblock.end_with_jump(loop_test, loc) loop_test.add_comment(b"'['", loc) loop_test.end_with_conditional(self.current_data_is_zero(loc), on_zero, on_non_zero, loc) self.open_parens.append(Paren(loop_test, on_non_zero, on_zero)) self.curblock = on_non_zero; elif ch == ']': self.curblock.add_comment(b"']'", loc) if not self.open_parens: raise CompileError(self, "mismatching parens") paren = self.open_parens.pop() self.curblock.end_with_jump(paren.b_test) self.curblock = paren.b_after elif ch == 'n': self.line +=1; self.column = 0; if ch != 'n': self.column += 1 def parse_into_ctxt(self, filename): """ Parse the given .bf file into the gccjit.Context, containing a single function "func". """ self.filename = filename; self.line = 1 self.column = 0 with open(filename) as f_in: for ch in f_in.read(): self.compile_char(ch) self.curblock.end_with_void_return() # Compiling to an executable def compile_to_file(self, output_path): # Wrap "func" up in a "main" function mainfunc, argv, argv = gccjit.make_main(self.ctxt) block = mainfunc.new_block() block.add_eval(self.ctxt.new_call(self.func, [])) block.end_with_return(self.int_zero) self.ctxt.compile_to_file(gccjit.OutputKind.EXECUTABLE, output_path) # Running the generated code in-process def run(self): import ctypes result = self.ctxt.compile() py_func_type = ctypes.CFUNCTYPE(None) py_func = py_func_type(result.get_code(b'func')) py_func() # Entrypoint def main(argv): from optparse import OptionParser parser = OptionParser() parser.add_option("-o", "--output", dest="outputfile", help="compile to FILE", metavar="FILE") (options, args) = parser.parse_args() if len(args) != 1: raise ValueError('No input file') inputfile = args[0] c = Compiler() c.parse_into_ctxt(inputfile) if options.outputfile: c.compile_to_file(options.outputfile) else: c.run() if __name__ == '__main__': try: main(sys.argv) except Exception as exc: print(exc) sys.exit(1)
The above script is a brainf-to-machine-code compiler.
We can use it to compile a .b script and run it in-process. This is a similar to how a language interpreter or virtual machine might convert frequently-executed functions into machine code:
$ python bf.py
emit-alphabet.b
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Success!
As well, as this Just-In-Time compilation use-case, we can also use it to compile .bf files into machine code executables:
$ python bf.py
emit-alphabet.b
-o a.out
which can then be run independently:
$ ./a.out
ABCDEFGHIJKLMNOPQRSTUVWXYZ
and we can inspect them using standard tools.
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped
For example, we can look at a disassembly:
$ objdump -d a.out |less
which shows that libgccjit has managed to optimize the function somewhat (for example, the runs of 26 and 65 increment operations have become integer constants 0x1a and 0x41):
0000000000400620 <func>:
400620: 80 3d 39 0a 20 00 00 cmpb $0x0,0x200a39(%rip) # 601060 <data_cells>
400627: 74 07 je 400630 <func+0x10>
400629: eb fe jmp 400629 <func+0x9>
40062b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400630: 48 83 ec 08 sub $0x8,%rsp
400634: 0f b6 05 26 0a 20 00 movzbl 0x200a26(%rip),%eax # 601061 <data_cells+0x1>
40063b: c6 05 1e 0a 20 00 1a movb $0x1a,0x200a1e(%rip) # 601060 <data_cells>
400642: 8d 78 41 lea 0x41(%rax),%edi
400645: 40 88 3d 15 0a 20 00 mov %dil,0x200a15(%rip) # 601061 <data_cells+0x1>
40064c: 0f 1f 40 00 nopl 0x0(%rax)
400650: 40 0f b6 ff movzbl %dil,%edi
400654: e8 87 fe ff ff callq 4004e0 <putchar@plt>
400659: 0f b6 05 01 0a 20 00 movzbl 0x200a01(%rip),%eax # 601061 <data_cells+0x1>
400660: 80 2d f9 09 20 00 01 subb $0x1,0x2009f9(%rip) # 601060 <data_cells>
400667: 8d 78 01 lea 0x1(%rax),%edi
40066a: 40 88 3d f0 09 20 00 mov %dil,0x2009f0(%rip) # 601061 <data_cells+0x1>
400671: 75 dd jne 400650 <func+0x30>
400673: 48 83 c4 08 add $0x8,%rsp
400677: c3 retq
400678: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40067f: 00
0000000000400680 <main>:
400680: 48 83 ec 08 sub $0x8,%rsp
400684: e8 97 ff ff ff callq 400620 <func>
400689: 31 c0 xor %eax,%eax
40068b: 48 83 c4 08 add $0x8,%rsp
40068f: c3 retq
We also set up debugging information (via gccjit.Context.new_location and gccjit.BoolOption.DEBUGINFO), so it's possible to use gdb to singlestep through the generated binary and inspect the internal state idx and data_cells:
(gdb) break main
Breakpoint 1 at 0x400790
(gdb) run
Starting program: a.out
Breakpoint 1, 0x0000000000400790 in main (argc=1, argv=0x7fffffffe448)
(gdb) stepi
0x0000000000400797 in main (argc=1, argv=0x7fffffffe448)
(gdb) stepi
0x00000000004007a0 in main (argc=1, argv=0x7fffffffe448)
(gdb) stepi
9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
(gdb) list
4
5 cell 0 = 26
6 ++++++++++++++++++++++++++
7
8 cell 1 = 65
9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
10
11 while cell#0 != 0
12 [
13 >
(gdb) n
6 ++++++++++++++++++++++++++
(gdb) n
9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
(gdb) p idx
$1 = 1
(gdb) p data_cells
$2 = "32", '00' <repeats 29998 times>
(gdb) p data_cells[0]
$3 = 26 '32'
(gdb) p data_cells[1]
$4 = 0 '00'
(gdb) list
4
5 cell 0 = 26
6 ++++++++++++++++++++++++++
7
8 cell 1 = 65
9 >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
10
11 while cell#0 != 0
12 [
13 >
Let's try another brainf script (by Erik Bosman):
$ time python examples/bf.py mandelbrot.b
AAAAAAAAAAAAAAAABBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDEGFFEEEEDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAAAABBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDEEEFGIIGFFEEEDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAABBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEFFFI KHGGGHGEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAABBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEFFGHIMTKLZOGFEEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAABBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEEFGGHHIKPPKIHGFFEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBBBB
AAAAAAAAAABBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGHIJKS X KHHGFEEEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBB
AAAAAAAAABBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGQPUVOTY ZQL[MHFEEEEEEEDDDDDDDCCCCCCCCCCCBBBBBBBBBBBBBB
AAAAAAAABBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEFFFFFGGHJLZ UKHGFFEEEEEEEEDDDDDCCCCCCCCCCCCBBBBBBBBBBBB
AAAAAAABBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEFFFFFFGGGGHIKP KHHGGFFFFEEEEEEDDDDDCCCCCCCCCCCBBBBBBBBBBB
AAAAAAABBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEFGGHIIHHHHHIIIJKMR VMKJIHHHGFFFFFFGSGEDDDDCCCCCCCCCCCCBBBBBBBBB
AAAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDEEEEEEFFGHK MKJIJO N R X YUSR PLV LHHHGGHIOJGFEDDDCCCCCCCCCCCCBBBBBBBB
AAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDEEEEEEEEEFFFFGH O TN S NKJKR LLQMNHEEDDDCCCCCCCCCCCCBBBBBBB
AAAAABBCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDEEEEEEEEEEEEFFFFFGHHIN Q UMWGEEEDDDCCCCCCCCCCCCBBBBBB
AAAABBCCCCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEFFFFFFGHIJKLOT [JGFFEEEDDCCCCCCCCCCCCCBBBBB
AAAABCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEEFFFFFFGGHYV RQU QMJHGGFEEEDDDCCCCCCCCCCCCCBBBB
AAABCCCCCCCCCCCCCCCCCDDDDDDDEEFJIHFFFFFFFFFFFFFFGGGGGGHIJN JHHGFEEDDDDCCCCCCCCCCCCCBBB
AAABCCCCCCCCCCCDDDDDDDDDDEEEEFFHLKHHGGGGHHMJHGGGGGGHHHIKRR UQ L HFEDDDDCCCCCCCCCCCCCCBB
AABCCCCCCCCDDDDDDDDDDDEEEEEEFFFHKQMRKNJIJLVS JJKIIIIIIJLR YNHFEDDDDDCCCCCCCCCCCCCBB
AABCCCCCDDDDDDDDDDDDEEEEEEEFFGGHIJKOU O O PR LLJJJKL OIHFFEDDDDDCCCCCCCCCCCCCCB
AACCCDDDDDDDDDDDDDEEEEEEEEEFGGGHIJMR RMLMN NTFEEDDDDDDCCCCCCCCCCCCCB
AACCDDDDDDDDDDDDEEEEEEEEEFGGGHHKONSZ QPR NJGFEEDDDDDDCCCCCCCCCCCCCC
ABCDDDDDDDDDDDEEEEEFFFFFGIPJIIJKMQ VX HFFEEDDDDDDCCCCCCCCCCCCCC
ACDDDDDDDDDDEFFFFFFFGGGGHIKZOOPPS HGFEEEDDDDDDCCCCCCCCCCCCCC
ADEEEEFFFGHIGGGGGGHHHHIJJLNY TJHGFFEEEDDDDDDDCCCCCCCCCCCCC
A PLJHGGFFEEEDDDDDDDCCCCCCCCCCCCC
ADEEEEFFFGHIGGGGGGHHHHIJJLNY TJHGFFEEEDDDDDDDCCCCCCCCCCCCC
ACDDDDDDDDDDEFFFFFFFGGGGHIKZOOPPS HGFEEEDDDDDDCCCCCCCCCCCCCC
ABCDDDDDDDDDDDEEEEEFFFFFGIPJIIJKMQ VX HFFEEDDDDDDCCCCCCCCCCCCCC
AACCDDDDDDDDDDDDEEEEEEEEEFGGGHHKONSZ QPR NJGFEEDDDDDDCCCCCCCCCCCCCC
AACCCDDDDDDDDDDDDDEEEEEEEEEFGGGHIJMR RMLMN NTFEEDDDDDDCCCCCCCCCCCCCB
AABCCCCCDDDDDDDDDDDDEEEEEEEFFGGHIJKOU O O PR LLJJJKL OIHFFEDDDDDCCCCCCCCCCCCCCB
AABCCCCCCCCDDDDDDDDDDDEEEEEEFFFHKQMRKNJIJLVS JJKIIIIIIJLR YNHFEDDDDDCCCCCCCCCCCCCBB
AAABCCCCCCCCCCCDDDDDDDDDDEEEEFFHLKHHGGGGHHMJHGGGGGGHHHIKRR UQ L HFEDDDDCCCCCCCCCCCCCCBB
AAABCCCCCCCCCCCCCCCCCDDDDDDDEEFJIHFFFFFFFFFFFFFFGGGGGGHIJN JHHGFEEDDDDCCCCCCCCCCCCCBBB
AAAABCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEEFFFFFFGGHYV RQU QMJHGGFEEEDDDCCCCCCCCCCCCCBBBB
AAAABBCCCCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEFFFFFFGHIJKLOT [JGFFEEEDDCCCCCCCCCCCCCBBBBB
AAAAABBCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDEEEEEEEEEEEEFFFFFGHHIN Q UMWGEEEDDDCCCCCCCCCCCCBBBBBB
AAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDEEEEEEEEEFFFFGH O TN S NKJKR LLQMNHEEDDDCCCCCCCCCCCCBBBBBBB
AAAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDEEEEEEFFGHK MKJIJO N R X YUSR PLV LHHHGGHIOJGFEDDDCCCCCCCCCCCCBBBBBBBB
AAAAAAABBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEFGGHIIHHHHHIIIJKMR VMKJIHHHGFFFFFFGSGEDDDDCCCCCCCCCCCCBBBBBBBBB
AAAAAAABBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEFFFFFFGGGGHIKP KHHGGFFFFEEEEEEDDDDDCCCCCCCCCCCBBBBBBBBBBB
AAAAAAAABBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEFFFFFGGHJLZ UKHGFFEEEEEEEEDDDDDCCCCCCCCCCCCBBBBBBBBBBBB
AAAAAAAAABBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGQPUVOTY ZQL[MHFEEEEEEEDDDDDDDCCCCCCCCCCCBBBBBBBBBBBBBB
AAAAAAAAAABBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGHIJKS X KHHGFEEEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBB
AAAAAAAAAAABBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEEFGGHHIKPPKIHGFFEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAABBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEFFGHIMTKLZOGFEEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAABBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEFFFI KHGGGHGEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAAAABBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDEEEFGIIGFFEEEDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBBBBB
real 0m10.639s
user 0m10.458s
sys 0m0.150s
Compiling it to an a.out and measuring timings shows that most of this time is spent optimizing it to machine code, with the resulting code taking about a second and a half to to run:
$ time python examples/bf.py mandelbrot.bf -o a.out
real 0m9.419s
user 0m9.223s
sys 0m0.166s
$ time (./a.out > /dev/null)
real 0m1.470s
user 0m1.465s
sys 0m0.003s
So there you have it, with libgccjit we were able to build an optimizing compiler, with debugger support, in 230 lines of Python.
Trying it out
GCC 5 and libgccjit will be available in the forthcoming Fedora 22 release.
The Python bindings can be downloaded from https://pypi.org/project/gccjit/
The code above can be seen at
https://github.com/davidmalcolm/pygccjit/blob/master/examples/bf.py