Pochi Doc

Core of the VM

The core of the VM acts kind of like the CPU for the VM. Put it differently the core of the VM is a byte-code interpreter, where the byte-code is the “machine code” of the Pochi VM.

Overview of the architecture

If you are familiar with either Hundred Rabbits’ uxn or GreenArrays’ F18a Pochi probably feel somewhat familiar too, as both of the project provided the initial inspiration on top of which Pochi was built.

In a single sentence:

The Pochi VM is a 32 bits dual stack machine with (for now) 52 six bits opcodes, 7 registers, (for now) 256KB of RAM with a variable amount of devices.

The core of the vm is roughly 200 loc of C and the goal is to keep it in that ballpark by prioritizing simplicity and ease of re-implementation.

32 bits seems a good compromise between power and backward compatibility. Ultimately, Pochi should be able to run on top of DuskOS as well as WASM both of which can handle 32 bits. This is in order to hedge against many possible future scenarios except for a total loss of information technology capability.

Stacks

Pochi has 2 stacks:

A circular working stack (also known as the data stack in forth) of 8 cells + a top of stack register t. This effectively means that the stack size is 9.

A regular 256 cells return stack, the top of stack register is called r.

Circular stack

The circular stack is an experiment, it’s encouraged to challenge this design as well as not be afraid to push this design to its limit.

The working stack is circular, this means that it can’t overflow or underflow (as opposed to the return stack, errors are discussed further down), but values can be stomped if you push more that 9 values on the stack.

An interesting technique often used by Chuck Moore himself is to prefill the stack with all the values needed for the subsequent computations, that way he never needs to push anything on the stack and only pop values infinitely from it.

Beware that the circularity of the stack is only 8 values, the top of stack t is not in the “circle”.

Deep reversible stack

This is also something that we could experiment on.

It seems counter intuitive to people that followed Chuck Moore and Leo Brodie’s advice to keep a shallow stack. But Devine piqued my interest the other day by saying:

...
<neauoire> deep reversible stacks
<neauoire> is where it's at
<neauoire> ;)
...

This might be worth to explore for Pochi as well, here are some of the references he gave me:

Something extremely interesting I found in the last link is the following paragraph:

Since Forth is usually implemented on a traditional von Neumann machine, one thinks of the return stack as holding “return addresses”. However, in these days of large instruction caches, in which entire cache lines are read from the main memory in one transaction, this view should be updated. It is well-known that non-scientific programs have a very high rate of conditional branches, with the mean number of instructions between branches being on the order of 10 or less. Forth programs are also very short, with “straight-line” (non-branching) sequences averaging 10 items or less. In these environments, it makes more sense to view the return stack itself as the instruction buffer cache! In other words, the return stack doesn’t hold “return addresses” at all, but the instructions themselves! When a routine is entered, the entire routine is dumped onto the top of the return stack, and execution proceeds with the top item of this stack. Since routines are generally very short, the transfer of an entire routine is about the same amount of work as transferring a complete cache line in present architectures. Furthermore, an instruction stack-cache-buffer is normally accessed sequentially, and therefore can be implemented using shift register technology. Since a shift register can be shifted faster than a RAM can be accessed, the “access time” of this instruction stack-cache-buffer will not be a limiting factor in a machine’s speed. Executing a loop in an instruction stack-cache-buffer is essentially the making of connections necessary to create a cyclic shift register which literally cycles the instructions of the loop around the cyclic shift register.

Using the return stack to hold the instructions directly sounds like it’d marry well with how Pochi packs up to 6 instruction in a single cell (See Slots).

Opcodes

Warning: Opcodes will probably change a lot, for now I just implemented them as I needed them but if it’s possible to reduce the number of opcode without sacrificing too much perf or convenience it will be done. The opcode order will also probably change.

Opcodes are 6 bits and this give us the possibility to have up to 64 opcodes, but as we are concern by simplicity of implementation, if we can use less opcodes, we will. The opcodes used are based on the 32 instructions of the F18a with the “multiply-step” replaced by a simple multiplication and then a few other instructions were added for performance like / or swap and some were added to ease the handling of string data with the c@,c!,c@+ and c!+ words.

To understand what the opcodes are doing you could have a look at the GreenArray’s amazing doc of the F18a and for the other opcodes they should feel fairly familiar if you are acquainted with Forth. Otherwise the best source of truth when it comes to describing what are the opcodes and what they do is looking directly at the code in pochi.c. It shouldn’t be too hard to understand.

Slots

Since opcodes are 6 bits wide and the vm is 32 bits, we can pack 5 opcodes + a special 2 bits opcode in a single cell! (Or word, but word has a different meaning in the forth context, so we stick with cells) This means that a cell is composed of 6 slots, numbered from 0 to 5.

Registers

Along with the stacks and memory, the Pochi VM uses a few registers:

A and B

The register a is read (a) and write (a!), it is usually used to hold an address for the words @ and ! which respectively “fetch through a” and “store through a”. Additionally there are the post-increment version of fetch and store: @+ and !+. Those are pretty useful to read and write serial data.

The register b is write-only. This way it cannot be read nor incremented, you can only:

store to b => b!
fetch through b => @b
store through b => !b

I, P and S

The i and s register cannot be read or written, they are used to keep track of the internal state of the execution. i holds the current instruction cell and s holds the slot number of the current opcode. We can actually fetch the next slot and increment it with the @s opcode.

The p register holds the address we are fetching instruction and literals from. p cannot be read or written, but you can fetch (@p) and store (!p) through it with post-increment.

You are also interacting with all those registers whenever you jump, call, ex or ;.

R and T

The r and t register represent the top of stack of the return stack and the working stack.

The top of the return stack can be accessed and modified with the following words: r, >r, r>.

Memory

There is 256KB of RAM, both data and code can live in the RAM. The RAM can be accessed whenever the register a, b, or p point to an address between 0x0000 and 0xFFFF. If those same registers point to an address between 0x10000 and 0x1FFFF the will provide access to devices which are documented in a separate document.

Errors

The core can produce 4 different kinds of errors:

Return stack underflow (= 1)
Return stack overflow (= 2)
Division by zero (= 3)
Unknown Error (= 4)

Handling those executions errors is described in the System device.

Note that the last error should never happen.