Brainfuck is a very interesting programming language that has only 8 operators:
Operator | Code in C/C++ |
---|---|
+ |
buff[ptr]++ |
- |
buff[ptr]-- |
> |
ptr++ |
< |
ptr-- |
[ |
if (!buff[ptr]) goto ']' |
] |
if (buff[ptr] goto '[' |
, |
buff[ptr] = getchar() |
. |
putchar(buff[ptr]) |
This simple syntax makes brainfuck a great language for me to learn how to build an interpreter and JIT(just-in-time) compiler.
This project has a simple interpreter for brainfuck, using switch-threading:
for (size_t i = 0; i < code.size(); ++i) {
switch (code[i].op) {
case op_add: buff[p] += code[i].num; break;
case op_sub: buff[p] -= code[i].num; break;
case op_addp: p += code[i].num; break;
case op_subp: p -= code[i].num; break;
case op_jt: if (buff[p]) i = code[i].num; break;
case op_jf: if (!buff[p]) i = code[i].num; break;
case op_in: buff[p] = getchar(); break;
case op_out: putchar(buff[p]); break;
}
}
To optimize the efficiency of the interpreter,
i count consecutive operators instead of just translating operators into an opcode.
You will see the structure of opcode in jit.cpp
.
For example:
bf code | opcode |
---|---|
+++ |
buf[p] += 3 |
---- |
buf[p] -= 4 |
>>>>> |
p += 5 |
<< |
p -= 2 |
After generating opcodes,
it's quite easy for us to generate machine codes into a memory space allocated by mmap
.
This memory space must be read/write/exec
so we could execute the machine codes in this memory space.
You could see the mmap
in amd64jit::amd64jit(const size_t)
in file amd64jit.h
.
I use a global u8 array buff[0x20000]
to be the paper of brainfuck machine(and rbx
stores the pointer),
and remember to use memset to clean the stack space to zero-filled.
/* set bf machine's paper pointer */
mem.push({0x48, 0xbb}).push64((uint64_t)buff); // movq $buff, %rbx
These four operators are not so difficult to translate to machine codes:
case op_add: mem.push({0x80, 0x03, (uint8_t)(op.num & 0xff)}); break; // addb $op.num, (%rbx)
case op_sub: mem.push({0x80, 0x2b, (uint8_t)(op.num & 0xff)}); break; // subb $op.num, (%rbx)
case op_addp: mem.push({0x48, 0x81, 0xc3}).push32(op.num); break; // add $op.num, %rbx
case op_subp: mem.push({0x48, 0x81, 0xeb}).push32(op.num); break; // sub $op.num, %rbx
int putchar(int);
op_out
uses the putchar
,
write a demo and use objdump to see how the gcc and clang generate the machine code that calls the function,
then just copy them :)
mem.push({0x48, 0xb8}).push64((uint64_t)putchar); // movabs $putchar, %rax
#ifndef _WIN32
mem.push({0x0f, 0xbe, 0x3b}); // movsbl (%rbx), %edi
#else
mem.push({0x0f, 0xbe, 0x0b}); // movsbl (%rbx), %ecx
#endif
mem.push({0xff, 0xd0}); // callq *%rax
You may find that there's a small difference between generated machine code on Windows platform.
This is because the rule of parameter passing in call convention of Windows is different from Linux/macOS/Unix.
And Linux/macOS/Unix use rdi
to get the first parameter, but Windows uses rcx
.
Although JIT-compiler developers should remember this rule, it is quite easier to remember x86_64/amd64 call convention than x86_32...
int getchar();
op_in
uses the getchar
,
also we just use the objdump to see how gcc/clang generate the code,
and just copy them :)
Luckily, on Windows/Linux/macOS/Unix platform, the return value int
will all be stored in register rax
. And we just need to mov the low 8-bits of rax
to rbx[0]
(aka movsbl %al,(%rbx)
).
mem.push({0x48, 0xb8})
.push64((uint64_t)getchar); // movabs $getchar, %rax
mem.push({0xff, 0xd0}); // callq *%rax
mem.push({0x88, 0x03}); // movsbl %al, (%rbx)
So we don't need to write #ifndef _WIN32
and so on :)
je
and jne
are two difficulties in this project.
You must calculate the distance of two jump labels to make sure they work correctly.
amd64jit& amd64jit::je() {
push({0x0f, 0x84}).push32(0x0); // je
stk.push(ptr);
return *this;
}
amd64jit& amd64jit::jne() {
push({0x0f, 0x85}).push32(0x0); // jne
uint8_t* je_next = stk.top();
stk.pop();
uint8_t* jne_next = ptr;
uint64_t p0 = jne_next - je_next;
uint64_t p1 = je_next - jne_next;
jne_next[-4] = p1 & 0xff;
jne_next[-3] = (p1 >> 8) & 0xff;
jne_next[-2] = (p1 >> 16) & 0xff;
jne_next[-1] = (p1 >> 24) & 0xff;
je_next[-4] = p0 & 0xff;
je_next[-3] = (p0 >> 8) & 0xff;
je_next[-2] = (p0 >> 16) & 0xff;
je_next[-1] = (p0 >> 24) & 0xff;
return *this;
}
op_jf([
) uses the je
and op_jt(]
) uses the jne
.
bf code | opcode | machine code |
---|---|---|
+ |
op_add | addb $op.num, (%rbx) |
- |
op_sub | subb $op.num, (%rbx) |
> |
op_addp | add $op.num, %rbx |
< |
op_subp | sub $op.num, %rbx |
[ |
op_jf | je label |
] |
op_jt | jne label |
, |
op_in | callq *%rax & movsbl %al, (%rbx) |
. |
op_out | callq *%rax |
Here's a simple pattern that could be optimized:
[-]
Or:
[+]
This means we should set buff[p]
to zero.
So we could add another opcode op_setz
:
bf code | opcode | machine code |
---|---|---|
[-] |
op_setz | movb $0, (%rbx) |
[+] |
op_setz | movb $0, (%rbx) |
// movb $0, (%rbx)
case op_setz: mem.push({0xc6, 0x03, 0x00}); break;
Not implemented here, but need to mention.
-<++>-
Could be optimized to:
--<++>
Which is called canonicalization
. This needs the compiler to have the ability to analyze memory/variable access patterns in bf.
And more possible patterns are waiting to be optimized:
[-<+>]
[-<<->>]
[->>+<<]
[->++>>>+++++>++>+<<<<<<]
Want to check the output machine code of different CPU arch?
You may need this website: godbolt.org