Skip to content

ValKmjolnir/brainfuck-jit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brainfuck Just-In-Time Compiler

Introduction

Brainfuck is a very interesting programming language that has only 8 operators:

Operator Code in C/C++
+ buff[ptr]++
- buff[ptr]--
> ptr++
< ptr--
[ if (!buff[ptr]) goto ']'
] if (buff[ptr] goto '['
, buff[ptr] = getchar()
. putchar(buff[ptr])

This simple syntax makes brainfuck a great language for me to learn how to build an interpreter and JIT(just-in-time) compiler.

Brainfuck Interpreter

This project has a simple interpreter for brainfuck, using switch-threading:

for (size_t i = 0; i < code.size(); ++i) {
    switch (code[i].op) {
        case op_add: buff[p] += code[i].num; break;
        case op_sub: buff[p] -= code[i].num; break;
        case op_addp: p += code[i].num; break;
        case op_subp: p -= code[i].num; break;
        case op_jt: if (buff[p]) i = code[i].num; break;
        case op_jf: if (!buff[p]) i = code[i].num; break;
        case op_in: buff[p] = getchar(); break;
        case op_out: putchar(buff[p]); break;
    }
}

Basic Optimization

To optimize the efficiency of the interpreter, i count consecutive operators instead of just translating operators into an opcode. You will see the structure of opcode in jit.cpp.

For example:

bf code opcode
+++ buf[p] += 3
---- buf[p] -= 4
>>>>> p += 5
<< p -= 2

Just-In-Time Compiler

mmap

After generating opcodes, it's quite easy for us to generate machine codes into a memory space allocated by mmap.

This memory space must be read/write/exec so we could execute the machine codes in this memory space. You could see the mmap in amd64jit::amd64jit(const size_t) in file amd64jit.h.

I use a global u8 array buff[0x20000] to be the paper of brainfuck machine(and rbx stores the pointer), and remember to use memset to clean the stack space to zero-filled.

/* set bf machine's paper pointer */
mem.push({0x48, 0xbb}).push64((uint64_t)buff); // movq $buff, %rbx

Add & Sub Operations

These four operators are not so difficult to translate to machine codes:

case op_add:  mem.push({0x80, 0x03, (uint8_t)(op.num & 0xff)}); break; // addb $op.num, (%rbx)
case op_sub:  mem.push({0x80, 0x2b, (uint8_t)(op.num & 0xff)}); break; // subb $op.num, (%rbx)
case op_addp: mem.push({0x48, 0x81, 0xc3}).push32(op.num);    break; // add $op.num, %rbx
case op_subp: mem.push({0x48, 0x81, 0xeb}).push32(op.num);    break; // sub $op.num, %rbx

Library Function putchar & getchar

putchar

int putchar(int);

op_out uses the putchar, write a demo and use objdump to see how the gcc and clang generate the machine code that calls the function, then just copy them :)

mem.push({0x48, 0xb8}).push64((uint64_t)putchar); // movabs $putchar, %rax
#ifndef _WIN32
mem.push({0x0f, 0xbe, 0x3b}); // movsbl (%rbx), %edi
#else
mem.push({0x0f, 0xbe, 0x0b}); // movsbl (%rbx), %ecx
#endif
mem.push({0xff, 0xd0}); // callq *%rax

You may find that there's a small difference between generated machine code on Windows platform. This is because the rule of parameter passing in call convention of Windows is different from Linux/macOS/Unix. And Linux/macOS/Unix use rdi to get the first parameter, but Windows uses rcx.

Although JIT-compiler developers should remember this rule, it is quite easier to remember x86_64/amd64 call convention than x86_32...

getchar

int getchar();

op_in uses the getchar, also we just use the objdump to see how gcc/clang generate the code, and just copy them :)

Luckily, on Windows/Linux/macOS/Unix platform, the return value int will all be stored in register rax. And we just need to mov the low 8-bits of rax to rbx[0] (aka movsbl %al,(%rbx)).

mem.push({0x48, 0xb8})
   .push64((uint64_t)getchar); // movabs $getchar, %rax
mem.push({0xff, 0xd0});        // callq *%rax
mem.push({0x88, 0x03});        // movsbl %al, (%rbx)

So we don't need to write #ifndef _WIN32 and so on :)

Jump Operation

je and jne are two difficulties in this project. You must calculate the distance of two jump labels to make sure they work correctly.

amd64jit& amd64jit::je() {
    push({0x0f, 0x84}).push32(0x0); // je
    stk.push(ptr);
    return *this;
}

amd64jit& amd64jit::jne() {
    push({0x0f, 0x85}).push32(0x0); // jne
    uint8_t* je_next = stk.top();
    stk.pop();
    
    uint8_t* jne_next = ptr;
    uint64_t p0 = jne_next - je_next;
    uint64_t p1 = je_next - jne_next;
    jne_next[-4] = p1 & 0xff;
    jne_next[-3] = (p1 >> 8) & 0xff;
    jne_next[-2] = (p1 >> 16) & 0xff;
    jne_next[-1] = (p1 >> 24) & 0xff;
    je_next[-4] = p0 & 0xff;
    je_next[-3] = (p0 >> 8) & 0xff;
    je_next[-2] = (p0 >> 16) & 0xff;
    je_next[-1] = (p0 >> 24) & 0xff;
    return *this;
}

op_jf([) uses the je and op_jt(]) uses the jne.

Conclusion

bf code opcode machine code
+ op_add addb $op.num, (%rbx)
- op_sub subb $op.num, (%rbx)
> op_addp add $op.num, %rbx
< op_subp sub $op.num, %rbx
[ op_jf je label
] op_jt jne label
, op_in callq *%rax & movsbl %al, (%rbx)
. op_out callq *%rax

Simple Optimization

Here's a simple pattern that could be optimized:

[-]

Or:

[+]

This means we should set buff[p] to zero. So we could add another opcode op_setz:

bf code opcode machine code
[-] op_setz movb $0, (%rbx)
[+] op_setz movb $0, (%rbx)
// movb $0, (%rbx)
case op_setz: mem.push({0xc6, 0x03, 0x00}); break;

Advanced Optimization

Not implemented here, but need to mention.

-<++>-

Could be optimized to:

--<++>

Which is called canonicalization. This needs the compiler to have the ability to analyze memory/variable access patterns in bf.

And more possible patterns are waiting to be optimized:

[-<+>]
[-<<->>]
[->>+<<]
[->++>>>+++++>++>+<<<<<<]

More

Want to check the output machine code of different CPU arch?

You may need this website: godbolt.org