"Hello world" in Bismuth

Eniko Fox July 31, 2025

This is the third in a series of posts about a virtual machine I’m developing as a hobby project called Bismuth. I’ve talked a lot about Bismuth, mostly on social media, but I don’t think I’ve done a good job at communicating how you go from some code to a program in this VM. In this post I aim to rectify that by walking you through the entire life cycle of a hello world Bismuth program, from the highest level to the lowest.

let hello = data_utf8("Hello world!\n");

func main() i32 {
    // system call 0x10 is the PrintStr system call
    sys(0x10, hello, 0, sizeof(hello));
    return 0;
}

This code will be converted to the VM’s intermediate representation, which can then be transpiled to C, or compiled to a binary version of the IR, which the VM ingests and turns into bytecode and runs.

Bronze

The language this code is written in is Bronze, which converts mostly 1:1 into the VM’s IR. It’s just got niceties like a C-like syntax and infix operators. It makes for a good test-bed of the IR and I find a bit more comfortable to write than s-expressions. Let’s go through it line by line.

let hello = data_utf8("Hello world!\n");

This first line sets up a global variable hello which has data which the VM should initialize when the program starts. The data should be initialized from the specified UTF8 encoded string. This means that when the program starts the global hello will contain a handle to some memory that contains the string in question.

func main() i32 {

This is the main function, automatically called when the program starts. It returns an i32, a 32-bit integer. This is one of the two types the VM currently supports, the other being pointer. More on that one in a future post. All functions in Bismuth are expected to return an integer. If you don’t want a function to return an integer, just return zero.

sys(0x10, hello, 0, sizeof(hello));

Because Bismuth is completely isolated from the outside world to do anything interesting you have to go through system calls. These are hardcoded functions the VM can perform, like copying or clearing memory, printing strings or characters, converting between integers and strings, and more. Syscalls are like the standard library of the VM.

This particular syscall, 0x10, is PrintStr. The first argument is a handle to string memory, the second is the byte offset within memory at which to start printing characters, and the third is the number of bytes to print. Because all memory access goes through handles with the VM bounds checking the access, the system knows the size of allocated memory and sizeof(hello) can return the number of bytes stored in our string.

Finally, after printing our string, we return:

return 0;

Text IR

Bronze, like all languages that would target the Bismuth VM, converts its code into plain-text IR. Running the compiler on the above code will produce the following IR:

(global hello)
(data hello utf8 "Hello world!\n")

(func main () {
    (sys 0x10 hello 0 (len hello))
    (ret 0)
})

Given that Bronze maps basically 1:1 to the IR this doesn’t contain any surprises. We declare our global, initialize it with a data statement, and then define the main function which prints it and returns.

This IR is read by the plain-text IR parser, which creates an abstract syntax tree. At this point there’s two things that can be done by visiting each node in this AST: the C transpiler can convert the IR to C, or the binary IR compiler can convert the text version of the IR to binary.

We’ll start with the C transpiler. Being able to transpile to C offers numerous advantages. C code is able to be compiled ahead-of-time, is maximally portable, and by transpiling to C programs could run on older hardware, embedded hardware, or even webassembly.

The C code can also be compiled by an optimizing compiler like GCC or Clang, so if you need more performance on modern systems than the VM offers you can transpile your code and run it stand-alone. The VM itself is written in C, so the relevant system calls and other code can be compiled with your program. In fact I’m already writing syscalls used by the VM in Bronze, which works because it’s transpiled to C.

Bismuth C

So what does this C code look like? Well, I tried my best to keep it readable but it’s C, so it’s not strictly the prettiest code you’ll have ever seen:

static uint32_t xhello;

uint32_t MyMain(EXCEPTION* eret, CONTEXT context);
uint32_t MyMain(EXCEPTION* eret, CONTEXT context) {
#define EXCEPTION_HANDLER default_exception_handler
    EXCEPTION e = {0};
    {
        uint32_t t0, t1;

        // (sys 16 hello 0 (len hello))
        HLEN(t1, GLOBAL(xhello));
        SYS(t0, PrintStr, GLOBAL(xhello), 0, t1);
        // (ret 0)
        return 0;
    }
#undef EXCEPTION_HANDLER
    // default return value and default exception handler
default_exception_handler:;
    *eret = e;
    return 0;
}

static const char rodata_xhello[13] = {
    0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x21, 0x0a, 
};

uint32_t BismuthC_ModuleInit(EXCEPTION* eret, CONTEXT context, bool isGlobalContext) {
#define EXCEPTION_HANDLER default_exception_handler
    EXCEPTION e = {0};
    {
        Alloc mi_alloc;
        if (!InitCGlobal(&e, context, &xhello, isGlobalContext, true)) { THROW(e); }
        if (!isGlobalContext) {
            ALLOC(GLOBAL(xhello), 13, 1);
            mi_alloc = Handles_Get(&context->Handles, GLOBAL(xhello));
            if (mi_alloc.Ptr == NULL) THROW(INVALID_HANDLE);
            memcpy(mi_alloc.Ptr, rodata_xhello, 13);
        }
        return 1;
    }
#undef EXCEPTION_HANDLER
    // default return value and default exception handler
default_exception_handler:;
    *eret = e;
    return 0;
}

(Note that I’ve skipped some uninteresting header/footer bits here to reduce the noise)

There’s a lot to unpack here so let’s look at the big picture first.

static uint32_t xhello;

uint32_t MyMain(EXCEPTION* eret, CONTEXT context);
uint32_t MyMain(EXCEPTION* eret, CONTEXT context) {
    ...
}

static const char rodata_xhello[13] = {
    ...
};

uint32_t BismuthC_ModuleInit(EXCEPTION* eret, CONTEXT context, bool isGlobalContext) {
   ...
}

First we see there’s a static (private) 32-bit unsigned integer called xhello. The transpiler defaults to prefixing all identifiers with an ‘x’ to avoid name collisions. Then the transpiler declares the signature of all the functions, of which there is one, and then it outputs the actual function bodies. There’s a weirdly named rodata_xhello global that contains an array of hex data. Then there’s something that looks kind of like a normal function called BismuthC_ModuleInit.

You’ll notice that my main function isn’t called xmain here and isn’t static. That’s because the transpiler can be given symbol names to export as different names via the command line. Here I’ve specified -export main MyMain. A similar thing can be done to rename the module init function using -initname, though I didn’t do it here.

Main C function

Let’s look at our main function next.

uint32_t MyMain(EXCEPTION* eret, CONTEXT context) {
#define EXCEPTION_HANDLER default_exception_handler
    EXCEPTION e = {0};
    {
        uint32_t t0, t1;

        // (sys 16 hello 0 (len hello))
        HLEN(t1, GLOBAL(xhello));
        SYS(t0, PrintStr, GLOBAL(xhello), 0, t1);
        // (ret 0)
        return 0;
    }
#undef EXCEPTION_HANDLER
    // default return value and default exception handler
default_exception_handler:;
    *eret = e;
    return 0;
}

The function returns a 32-bit integer, just like all functions in Bismuth. It also takes an EXCEPTION pointer eret and a CONTEXT. Bismuth supports try/catch/finally style syntax, and eret is used when there’s an uncaught exception, which is handled by the default_exception_handler, which simply sets eret to e and returns a default value. The program context is used by many of the macros that make Bismuth C tick, more on that later.

Next we abuse macros to tell the C compiler where the current exception handler lives. At the start of a function that’s always default_exception_handler, but a try/catch block would redefine this for the length of that block. We also initialize our local error object e, which is actually just a handle or 32-bit integer, to zero. A lot of macros like EXCEPTION seen in Bismuth C are there for some degree of future proofing; the transpiler doesn’t need to know the exact type of EXCEPTION or CONTEXT, it just needs to know that they’re whatever that means.

Now we get to the actual user code.

        uint32_t t0, t1;

        // (sys 16 hello 0 (len hello))
        HLEN(t1, GLOBAL(xhello));
        SYS(t0, PrintStr, GLOBAL(xhello), 0, t1);
        // (ret 0)
        return 0;

At the top is where the transpiler would declare local variables if we had any. They, like other identifiers, would be prefixed by an ‘x’, so foo would turn into xfoo. It also declares what I call “temporaries.” These are numbered variables prefixed with a ‘t’ and they’re here because while I sometimes really enjoy C as a language, it also really really sucks. Let me explain.

First, C doesn’t strictly guarantee the order of operations for certain things, including function arguments. Meanwhile, Bismuth does guarantee things are done in a certain order. This means that whenever the transpiler finds code that could have side-effects and so must be done in order even though C gives no such guarantees, that code must be broken out into separate statements, done in-order in advance, and the results passed in-order to whatever we’re doing like making a function call.

Temporaries are created when the C transpiler has to break out this code. The sub-expression is lifted out of where it is, to before the current statement, evaluated, stored in a tN variable, and then used.

Second, anything which requires a statement expression, that is a block of code with multiple statements needed to arrive at the result, is also broken out the same way. Why? Because statement expressions are an extension and so aren’t universally available, so even though it would improve code readability using them would harm portability and so becomes unacceptable. A simple example would be the syscall macro:

#define SYS(dest, call, ...) { \
    dest = BISMUTHC_CONCAT(Sys_, call)(&e, context, __VA_ARGS__); \
    if (IS_EXCEPTION(e)) { THROW(e); } \
}

This macro actually calls a named C function for the system call with the address of our local exception, the program context, and then whatever other arguments there are. But it (like many macros) then transparently checks whether an exception was passed down the call stack, and then throws it if so. So even though in our IR sys is an expression, we need a block of statements in C to handle all of its functionality.

Looking at the actual code our program wants to run illustrates this quite nicely:

        // (sys 16 hello 0 (len hello))
        HLEN(t1, GLOBAL(xhello));
        SYS(t0, PrintStr, GLOBAL(xhello), 0, t1);

First, the transpiler creates a comment that shows the reader what IR created the C code directly below it. Because getting the length of a memory handle is something that could throw an exception (the handle could be invalid, for example) that expression is broken out and done first, assigning the result to t1. Then the SYS macro assigns the result to t0, says it wants to call Sys_PrintStr, and then provides the arguments.

Bismuth C module initializer

We’ll figure out what this whole GLOBAL macro business is when we check out the module initializer: BismuthC_ModuleInit. I’ll be cutting out the pre- and post-amble to get right to the meat and potatoes of the function:

        Alloc mi_alloc;
        if (!InitCGlobal(&e, context, &xhello, isGlobalContext, true)) { THROW(e); }
        if (!isGlobalContext) {
            ALLOC(GLOBAL(xhello), 13, 1);
            mi_alloc = Handles_Get(&context->Handles, GLOBAL(xhello));
            if (mi_alloc.Ptr == NULL) THROW(INVALID_HANDLE);
            memcpy(mi_alloc.Ptr, rodata_xhello, 13);
        }

The module initializer for a program initializes the global state of an instance of a Bismuth program that’s been transpiled to C. Each instance of a program context has its own memory space and its own table which maps handles to memory, so we can’t simply set our global xhello to the value of the handle. Rather, xhello needs to uniquely identify where in our program context the value of its corresponding global can be found.

To do this, the module initializer actually has to be run at least twice. Before any program can run a global program context must run the module initializer, which will initialize all globals to this unique value. It’s easier to understand by looking at the InitCGlobal function:

uint32_t InitCGlobal(EXCEPTION* eret, ProgramContext* context, uint32_t* global, bool isGlobalContext, bool canThrow) {
    assert(global != NULL);

    if (isGlobalContext) {
        assert(*global == 0);

        if (context->CGlobalsCount == UINT32_MAX) {
            if (canThrow) {
                THROW_RET(TOO_MANY_CGLOBALS);
            }
            else {
                return 0;
            }
        }

        *global = context->CGlobalsCount++;
    }
    else {
        assert(context->CGlobals != NULL);

        if (*global >= context->CGlobalsCount) {
            if (canThrow) {
                THROW_RET(CGLOBAL_OUT_OF_BOUNDS);
            }
            else {
                return 0;
            }
        }
    }
    return 1;
}

The first time this function is called, with the global context, the number of C globals are counted and each global is assigned its index in an array of C globals held by the context. Subsequent calls of this function for different program contexts ensure that the global’s value is valid.

So what does GLOBAL(xhello) do? It just transforms into context->CGlobals[xhello], looking up the value of the global in our array of C globals unique to this program context.

Next is this bit of code:

        if (!isGlobalContext) {
            ALLOC(GLOBAL(xhello), 13, 1);
            mi_alloc = Handles_Get(&context->Handles, GLOBAL(xhello));
            if (mi_alloc.Ptr == NULL) THROW(OUT_OF_MEMORY);
            memcpy(mi_alloc.Ptr, rodata_xhello, 13);
        }

This actually initializes the data for our global variable. This data initialization isn’t run for the global context because the global context never actually runs any code, so it’d just be a waste of effort and memory.

This code allocates 13 bytes of memory for our global and passes in a non-zero value to indicate this memory handle is privileged. That is, only privileged code can access the memory associated with this handle. Right now all C code is considered privileged, mostly because I’m only using the C transpiler to write syscalls for the VM right now. Multiple privilege levels in C transpiled code is on my todo list.

After allocating the memory, we fetch the actual allocation containing the pointer, size, and flags from the handles table as mi_alloc. If the allocation failed our global will be set to a zero handle and so attempting to fetch it will give us an empty Alloc struct with a null pointer. So if we detect that, we throw an out of memory exception.

And this illustrates a neat thing about Bismuth C: even when we’re directly writing the C code ourselves we can still use non-C constructs like exceptions so long as our functions conform to the parameters of a Bismuth C function. In the future it could be possible to transpile a Bismuth program to C and use it as a C library, wouldn’t that be neat?

Our final act is to use memcpy to copy the data from rodata_xhello to the memory pointed at by our global’s handle. Having set everything up appropriately, we exit the module initializer and our program is ready to run. At this point calling this function is as easy as:

Handle e = HNULL;
MyMain(&e, context);
if (e) printf("Error executing Bismuth C code.\n");

Binary IR

Now let’s look at the other branch, where we turn the text-based IR into binary. I’m honestly not sure if long term this step will remain or not, but for now I find it convenient to have the binary IR that represents the program’s abstract syntax tree because I can do things like read the IR and emit bytecode in a single interleaved pass. This makes dealing with reading the IR easier on the side of the VM because I just don’t find C great at text parsing. The other compilers meanwhile, are written in C# which is a lot better for that kinda thing. It also allows me to do things like inserting code at the beginning of the program which invokes the start system call and then calls main.

Data section

Let’s open the binary in a hex editor and have a little look.

Bismuth binary IR opened in a hex editor

We can see our “Hello world!” string. This is in the data section at the start of the file. The first three 32-bit values are 11, 1, and 1. The first, 11, is the data section size in 32-bit words. That means the data section size in bytes is 44, which takes us right past /DAT to the leading ‘S’ in SIGS, the next section. The next number is the number of globals in this program, followed by the number of globals that have to be initialized with data. After all, some globals can simply be initialized to zero.

After that header, there will be a number of entries equal to the initialization count, detailing what globals to initialize and how. Each entry starts with the global’s index, in this case we only have one and the first global index is zero. After this is the 32-bit size in bytes of the global, and then a 32-bit integer containing flags.

The size of our global is 13 bytes, the number of bytes in our UTF8 formatted string. The flags value is 1 which indicates this global has data to initialize. This “has-data” flag will only ever not be set for globals which have a simple integer value rather than allocated memory. The next two bits indicate whether or not the memory is privileged (it isn’t) and whether or not the memory is read-only (it isn’t.)

Then comes the data for our string, and 3 padding null bytes. Because I’m an inelegant hack I decided that I wanted to store the data section the interpreter uses to initialize globals when the start syscall happens right inside the same bytecode it uses to run programs. The interpreter’s bytecode is all 32-bit words, so that’s why the padding bytes are there; they align the data to 32-bit words. This is also why the data section size is stored as words rather than bytes.

After that we see four /DAT bytes, which signify the end of the data section. I use those in the interpreter to verify that nothing went wrong with the data section during compilation from binary IR to bytecode.

Program sections

After the data section come three program sections, SIGS, FUNS, and PROG. The SIGS section contains an array of function signatures, starting with a 32-bit entries count, followed by the entries which for each signature encode the number of arguments as well as their type information. We only have one function and it takes zero arguments.

Then there’s FUNS which is the functions section. This again starts with a count of entries, followed by the entries. For each entry it writes the index of the function signature for that function (0 in this case), followed by the function’s label (label 1.)

The reason signatures are stored separately from functions is so that when I’m implementing function pointers in the future dynamic function invocations can be type-checked by simply checking the index of the calling signature with the index of the receiving function’s signature.

Finally we have the PROG section, which contains the actual program data: the binary IR. This leads with a count of “functions.” The word functions is in scare-quotes here because what this actually means is the number of top-level statements to process. These are almost all actual functions, but the starting code the binary IR compiler injects isn’t an actual function but a simple block of statements.

I’m a big believer in making binary formats as human-readable as possible so every node in our AST has a 2 or 3 character ID which identifies the type of AST node it is. Statements are always all caps, and you can see a few of them in this example: {} for a block containing multiple statements, __ for an expression statement (or discard,) FN for function, and RT for return. Expressions are always lower case.

The way the binary reader distinguishes between 2 and 3 character codes is by checking if the first character of the 2 character code is a space. If it is, then the code is contained in the next 3 bytes, otherwise there’s just the two characters.

We can see that our program starts with a block {}. The block is followed by a 7-bit encoded integer with the number of statements in that block. It’s first statement is an expression which invokes system call sy 0 (16-bit) which is the start system call, with 1 argument (8-bit), and is followed by the argument. The argument is an immediate number # with a value of 1, indicating the start system call should allocate space for a single global.

Next is another expression statement, with a call () node, calling the 0th function, which is our main function, providing a whopping zero arguments. Then after returning there’s another expression statement which invokes system call 1, which exits the program.

The body of our main function FN follows with it’s function index (0) and locals count and information (also 0.) This is followed by a block {} containing a statement which calls system call 16 or 0x10, which we know is PrintStr. The first argument is a global @ with an index of zero, our string global. The second is an integer, zero, for the offset, followed by [] which grabs the length of our global. This is all followed by the return RT statement which returns an immediate value of zero.

The binary IR isn’t super easy to read, but when I need to debug it I can read it okay with a little bit of effort, which is a lot better than most binary formats out there.

Finally, we load this binary IR into our interpreter, which flattens out the AST into a series of bytecode instructions.

Interpreter bytecode

I’m not going to do a thorough play-by-play for this one as an exhaustive description of each bytecode operation would take forever, but I will show the debug output for the bytecode and go through it more generally.

The VM mainly uses three general purpose registers: main, alt, and extra. Generally speaking opcodes are in the form of operation destination [arguments]. Any opcode that’s followed by a 32-bit word containing additional data is indicated with a little arrow -> and a contextual value.

The big picture is that the VM first transplants the data section into the bytecode:

00000000    11 data size
00000001    1 globals
00000002    1 global initializers
  -> 00000009
0000000a    /DAT (magic bytes should be "/DAT")

This section is used by the start syscall to initialize globals. Next is the bytecode for the start section inserted by the binary IR compiler:

0000000b      Movi Alt 1
0000000c      Push Alt
0000000d       Sys 0 (Start)
0000000e      Call
    -> +4 (0x12)
00000010      Txnz
00000011       Sys 1 (Exit)

It invokes the start system call with the value 1 indicating the number of globals. It then calls a function, jumping to 0x12 to do so, throws if the exception register is non-zero, then invokes the exit system call. Our main function is at 0x12:

00000012      Func 0
00000013      Puxh
    -> +13 (0x20)
(...)
00000020      Poxh
00000021      Movi Main 0
00000022       Ret

This allocates space for zero locals in the current stack frame, then pushes an exception handler (puxh) which, if an exception occurs, jumps to 0x20. Looking ahead to 0x20 you can see the exception handler is popped (poxh) and it returns zero. Because the exception register isn’t cleared, the start code’s txnz operation will propagate the exception. Since there’s no exception handler here the interpreter will panic with the error message if an exception is propagated. This is the standard sort of pre- and post-amble for a function.

Next is our actual code:

00000015      Gloi Alt 0
00000016      Push Alt
00000017      Movi Alt 0
00000018      Push Alt
00000019      Gloi Main 0
0000001a       Len Main Main
0000001b      Push Main
0000001c       Sys 16 (PrintStr)
0000001d      Movi Main 0
0000001e      Poxh
0000001f       Ret

The gloi (global load immediate) opcode loads global zero, our string, into a register and pushes it to the stack. Next we push a zero to the stack. Then we load our global into a register again, but this time we use the len operation which fetches the size in bytes of the memory pointed at by the handle. Finally we push that to the stack as well, and invoke the PrintStr syscall which prints our string. Then we pop the exception handler and return 0.

Mind you that system calls work differently from normal functions. A normal function call would store its arguments on the stack in reverse order (i.e. the first argument would be at the top of the stack) and creates and tears down stack frames. To invoke a system call you push the arguments in order, and they’re more lightweight because they don’t require stack frame creation/teardown.

With our bytecode ready we can initialize our global context (to figure out the index of any C globals,) then our program context, and finally run the interpreter. We get the following output:

Creating global context
Creating program context
Running script
Hello world!
Done!

And that’s the life cycle of a hello world program in Bismuth. This was a long one so kudos on making it all the way to the end! If you’ve enjoyed this one, maybe check out the other posts about Bismuth and I hope you’ll join me again for the next one.