From 03abefa8cd329359cd5c4c00e9b528c51a53904a Mon Sep 17 00:00:00 2001 From: Ryan Pendleton Date: Sat, 13 Jan 2024 22:47:06 -0700 Subject: [PATCH] regenerate using the latest version of srcweave to introduce more stable anchors --- docs/index.html | 336 ++++++++++++++++++++++++------------------------ 1 file changed, 168 insertions(+), 168 deletions(-) diff --git a/docs/index.html b/docs/index.html index 98085f2..44ef94a 100644 --- a/docs/index.html +++ b/docs/index.html @@ -11,8 +11,8 @@ - -

Write your Own Virtual Machine

+ +

Write your Own Virtual Machine

By: Justin Meiners and Ryan Pendleton

@@ -25,20 +25,20 @@

Write your Own Virtual Machine

All you need to know is how to read basic C or C++ and how to do binary arithmetic.

    -
  1. What is a virtual machine?
  2. -
  3. LC-3 architecture
  4. -
  5. Assembly examples
  6. -
  7. Executing programs
  8. -
  9. Implementing instructions
  10. -
  11. Instruction cheat sheet
  12. -
  13. Trap routines
  14. -
  15. Trap routine cheat sheet
  16. -
  17. Loading programs
  18. -
  19. Memory mapped registers
  20. -
  21. Platform specifics
  22. -
  23. Running the VM
  24. -
  25. Alternate C++ technique
  26. -
  27. Contributions
  28. +
  29. What is a virtual machine?
  30. +
  31. LC-3 architecture
  32. +
  33. Assembly examples
  34. +
  35. Executing programs
  36. +
  37. Implementing instructions
  38. +
  39. Instruction cheat sheet
  40. +
  41. Trap routines
  42. +
  43. Trap routine cheat sheet
  44. +
  45. Loading programs
  46. +
  47. Memory mapped registers
  48. +
  49. Platform specifics
  50. +
  51. Running the VM
  52. +
  53. Alternate C++ technique
  54. +
  55. Contributions
@@ -47,7 +47,7 @@

Write your Own Virtual Machine

Each piece of code from the VM project will be shown and explained thoroughly, so you can be sure nothing is left out. The final code was created by “tangling” the blocks of code together.

-

1. What is a virtual machine?

+

What is a virtual machine?

A VM is a program that acts like a computer. It simulates a CPU along with a few other hardware components, allowing it to perform arithmetic, read and write to memory, and interact with I/O devices, just like a physical computer. Most importantly, it can understand a machine language which you can use to program it.

@@ -70,7 +70,7 @@

1. What is a virtual machine?

Another example of this behavior is demonstrated by Ethereum smart contracts. Smart contracts are small programs which are executed by each validating node in the blockchain network. This requires the node operators to run programs on their machines that have been written by complete strangers, without any opportunity to scrutinize them beforehand. To prevent a contract from doing malicious things, they are run inside a VM that has no access to the file system, network, disc, etc. Ethereum is also a good application of the portability features that result when using a VM. Since Ethereum nodes can be run on many kinds of computers and operating systems, the use of a VM allows smart contracts to be written without any consideration of the many platforms they run on.

-

2. LC-3 architecture

+

LC-3 architecture

Our VM will simulate the LC-3, an educational computer architecture commonly used to teach university students computer architecture and assembly. It has a simplified instruction set compared to x86, but demonstrates the main ideas used by modern CPUs.

@@ -85,11 +85,11 @@

Memory

-Memory Storage +Memory Storage
#define MEMORY_MAX (1 << 16)
 uint16_t memory[MEMORY_MAX];  /* 65536 locations */
 
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Registers

@@ -105,7 +105,7 @@

Registers

-Registers +Registers
enum
 {
     R_R0 = 0,
@@ -121,17 +121,17 @@ 

Registers

R_COUNT };
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Just like the memory, we will store the registers in an array:

-Register Storage +Register Storage
uint16_t reg[R_COUNT];
 
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Instruction set

@@ -144,7 +144,7 @@

Instruction set

-Opcodes +Opcodes
enum
 {
     OP_BR = 0, /* branch */
@@ -165,7 +165,7 @@ 

Instruction set

OP_TRAP /* execute trap */ };
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Note: The Intel x86 architecture has hundreds of instructions, while others such as ARM and LC-3 have very few. Small instruction sets are referred to as RISCs while larger ones are called CISCs. Larger instruction sets typically do not provide any fundamentally new possibilities, but they often make it more convenient to write assembly for. A single instruction in CISC might take the place of several in RISC. However, they tend to be more complex and expensive for engineers to design and manufacture. This and other tradeoffs cause the designs to come in and out of style.

@@ -178,7 +178,7 @@

Condition flags

-Condition Flags +Condition Flags
enum
 {
     FL_POS = 1 << 0, /* P */
@@ -186,7 +186,7 @@ 

Condition flags

FL_NEG = 1 << 2, /* N */ };
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Note: (The << symbol is called the left bitshift operator. (n << k) shifts the bits of n to the left k places. Thus 1 << 2 will equal 4. Read that link if you are not familiar. It will be important.)

@@ -195,26 +195,26 @@

Condition flags

-/lc3.c -
@{Includes}
+/lc3.c
+
@{Includes}
 
-@{Registers}
-@{Condition Flags}
-@{Opcodes}
+@{Registers}
+@{Condition Flags}
+@{Opcodes}
 
-

3. Assembly examples

+

Assembly examples

Now let’s look at an LC-3 assembly program to get an idea of what the VM actually runs. You don’t need to know how to program assembly or understand everything that is going on. Just try to get a general idea of what is going on. Here is a simple “Hello World”:

-Hello World Assembly +Hello World Assembly
.ORIG x3000                        ; this is the address in memory where the program will be loaded
 LEA R0, HELLO_STR                  ; load the address of the HELLO_STR string into R0
 PUTs                               ; output the string pointed to by R0 to the console
@@ -241,7 +241,7 @@ 

3. Assembly examples

-Loop Assembly +Loop Assembly
AND R0, R0, 0                      ; clear R0
 LOOP                               ; label at the top of our loop
 ADD R0, R0, 1                      ; add 1 to R0 and store back in R0
@@ -254,7 +254,7 @@ 

3. Assembly examples

Note: Learning to write assembly is not necessary for this tutorial. However, if you are interested, you can write and assemble your own LC-3 programs using the LC-3 Tools.

-

4. Executing programs

+

Executing programs

Once again, the previous examples are just to give you an idea of what the VM does. To write a VM, you don’t need to be fluent in assembly. As long as you follow the proper procedure for reading and executing instructions, any LC-3 program will run correctly, no matter how complicated it is. In theory, it could even run a web browser or an operating system like Linux!

@@ -282,11 +282,11 @@

Procedure

-Main Loop +Main Loop
int main(int argc, const char* argv[])
 {
-    @{Load Arguments}
-    @{Setup}
+    @{Load Arguments}
+    @{Setup}
 
     /* since exactly one condition flag should be set at any given time, set the Z flag */
     reg[R_COND] = FL_ZRO;
@@ -306,58 +306,58 @@ 

Procedure

switch (op) { case OP_ADD: - @{ADD} + @{ADD} break; case OP_AND: - @{AND} + @{AND} break; case OP_NOT: - @{NOT} + @{NOT} break; case OP_BR: - @{BR} + @{BR} break; case OP_JMP: - @{JMP} + @{JMP} break; case OP_JSR: - @{JSR} + @{JSR} break; case OP_LD: - @{LD} + @{LD} break; case OP_LDI: - @{LDI} + @{LDI} break; case OP_LDR: - @{LDR} + @{LDR} break; case OP_LEA: - @{LEA} + @{LEA} break; case OP_ST: - @{ST} + @{ST} break; case OP_STI: - @{STI} + @{STI} break; case OP_STR: - @{STR} + @{STR} break; case OP_TRAP: - @{TRAP} + @{TRAP} break; case OP_RES: case OP_RTI: default: - @{BAD OPCODE} + @{BAD OPCODE} break; } } - @{Shutdown} + @{Shutdown} }
-

Used by 1 2

+

Used by 1 2

While we are at the main loop let’s handle command line input to make our program usable. @@ -365,7 +365,7 @@

Procedure

-Load Arguments +Load Arguments
if (argc < 2)
 {
     /* show usage string */
@@ -382,12 +382,12 @@ 

Procedure

} }
-

Used by 1 2 3

+

Used by 1 2 3

-

5. Implementing instructions

+

Implementing instructions

Your task now is to fill in each opcode case with a correct implementation. This is easier than it sounds. A detailed specification for each instruction is included in the project documents. The specificiation for each translates pretty easily to several lines of codes. I will demonstrate how to implement two of them here. The code for the rest can be found in the next section.

@@ -404,7 +404,7 @@

ADD

-Add Register Assembly +Add Register Assembly
ADD R2 R0 R1 ; add the contents of R0 to R1 and store in R2.
 
@@ -419,7 +419,7 @@

ADD

-Add Immediate Assembly +Add Immediate Assembly
ADD R0 R0 1 ; add 1 to R0 and store back in R0
 
@@ -433,7 +433,7 @@

ADD

-Sign Extend +Sign Extend
uint16_t sign_extend(uint16_t x, int bit_count)
 {
     if ((x >> (bit_count - 1)) & 1) {
@@ -442,7 +442,7 @@ 

ADD

return x; }
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Note: If you are interested in exactly how negative numbers can be represented in binary, you can read about Two’s Complement. However, this is not essential. You can just copy the code above and use it whenever the specification says to sign extend numbers.

@@ -455,7 +455,7 @@

ADD

-Update Flags +Update Flags
void update_flags(uint16_t r)
 {
     if (reg[r] == 0)
@@ -472,14 +472,14 @@ 

ADD

} }
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Now we are ready to write the code for the ADD case:

-ADD +ADD
{
     /* destination register (DR) */
     uint16_t r0 = (instr >> 9) & 0x7;
@@ -502,7 +502,7 @@ 

ADD

update_flags(r0); }
-

Used by 1

+

Used by 1

This section contained a lot of information, so let’s summarize. @@ -532,7 +532,7 @@

LDI

-C LDI Sample +C LDI Sample
// the value of far_data is an address
 // of course far_data itself (the location in memory containing the address) has an address
 char* far_data = "apple";
@@ -559,7 +559,7 @@ 

LDI

-LDI +LDI
{
     /* destination register (DR) */
     uint16_t r0 = (instr >> 9) & 0x7;
@@ -570,14 +570,14 @@ 

LDI

update_flags(r0); }
-

Used by 1

+

Used by 1

As I said, this instruction shared a lot of the code and knowledge learned from ADD. You will find this is the case with the remaining instructions.

You now need to go back and implement the rest of the switch cases for the instructions. Follow the specification and use the code listed here to complete the others. The code for all instructions is listed at the end of the tutorial. Two of the opcodes specified before will not be used, they are OP_RTI and OP_RES. You can ignore these cases or throw an error if they are executed. After you are done, the bulk of your VM will be completed!

-

6. Instruction cheat sheet

+

Instruction cheat sheet

This section contains the full implementations of the remaining instructions if you get stuck.

@@ -588,10 +588,10 @@

RTI & RES

-BAD OPCODE +BAD OPCODE
abort();
 
-

Used by 1

+

Used by 1

Bitwise and

@@ -600,7 +600,7 @@

Bitwise and

-AND +AND
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t r1 = (instr >> 6) & 0x7;
@@ -619,7 +619,7 @@ 

Bitwise and

update_flags(r0); }
-

Used by 1

+

Used by 1

Bitwise not

@@ -628,7 +628,7 @@

Bitwise not

-NOT +NOT
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t r1 = (instr >> 6) & 0x7;
@@ -637,14 +637,14 @@ 

Bitwise not

update_flags(r0); }
-

Used by 1

+

Used by 1

Branch

-BR +BR
{
     uint16_t pc_offset = sign_extend(instr & 0x1FF, 9);
     uint16_t cond_flag = (instr >> 9) & 0x7;
@@ -654,7 +654,7 @@ 

Branch

} }
-

Used by 1

+

Used by 1

Jump

@@ -663,21 +663,21 @@

Jump

-JMP +JMP
{
     /* Also handles RET */
     uint16_t r1 = (instr >> 6) & 0x7;
     reg[R_PC] = reg[r1];
 }
 
-

Used by 1

+

Used by 1

Jump register

-JSR +JSR
{
     uint16_t long_flag = (instr >> 11) & 1;
     reg[R_R7] = reg[R_PC];
@@ -693,14 +693,14 @@ 

Jump register

} }
-

Used by 1

+

Used by 1

Load

-LD +LD
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t pc_offset = sign_extend(instr & 0x1FF, 9);
@@ -708,14 +708,14 @@ 

Load

update_flags(r0); }
-

Used by 1

+

Used by 1

Load register

-LDR +LDR
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t r1 = (instr >> 6) & 0x7;
@@ -724,14 +724,14 @@ 

Load register

update_flags(r0); }
-

Used by 1

+

Used by 1

Load effective address

-LEA +LEA
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t pc_offset = sign_extend(instr & 0x1FF, 9);
@@ -739,42 +739,42 @@ 

Load effective address

update_flags(r0); }
-

Used by 1

+

Used by 1

Store

-ST +ST
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t pc_offset = sign_extend(instr & 0x1FF, 9);
     mem_write(reg[R_PC] + pc_offset, reg[r0]);
 }
 
-

Used by 1

+

Used by 1

Store indirect

-STI +STI
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t pc_offset = sign_extend(instr & 0x1FF, 9);
     mem_write(mem_read(reg[R_PC] + pc_offset), reg[r0]);
 }
 
-

Used by 1

+

Used by 1

Store register

-STR +STR
{
     uint16_t r0 = (instr >> 9) & 0x7;
     uint16_t r1 = (instr >> 6) & 0x7;
@@ -782,12 +782,12 @@ 

Store register

mem_write(reg[r1] + offset, reg[r0]); }
-

Used by 1

+

Used by 1

-

7. Trap routines

+

Trap routines

The LC-3 provides a few predefined routines for performing common tasks and interacting with I/O devices. For example, there are routines for getting input from the keyboard and for displaying strings to the console. These are called trap routines which you can think of as the operating system or API for the LC-3. Each trap routine is assigned a trap code which identifies it (similar to an opcode). To execute one, the TRAP instruction is called with the trap code of the desired routine.

@@ -798,7 +798,7 @@

7. Trap routines

-TRAP Codes +TRAP Codes
enum
 {
     TRAP_GETC = 0x20,  /* get character from keyboard, not echoed onto the terminal */
@@ -809,7 +809,7 @@ 

7. Trap routines

TRAP_HALT = 0x25 /* halt the program */ };
-

Used by 1 2 3 4

+

Used by 1 2 3 4

You may be wondering why the trap codes are not included in the instructions. @@ -833,32 +833,32 @@

7. Trap routines

-TRAP +TRAP
reg[R_R7] = reg[R_PC];
 
 switch (instr & 0xFF)
 {
     case TRAP_GETC:
-        @{TRAP GETC}
+        @{TRAP GETC}
         break;
     case TRAP_OUT:
-        @{TRAP OUT}
+        @{TRAP OUT}
         break;
     case TRAP_PUTS:
-        @{TRAP PUTS}
+        @{TRAP PUTS}
         break;
     case TRAP_IN:
-        @{TRAP IN}
+        @{TRAP IN}
         break;
     case TRAP_PUTSP:
-        @{TRAP PUTSP}
+        @{TRAP PUTSP}
         break;
     case TRAP_HALT:
-        @{TRAP HALT}
+        @{TRAP HALT}
         break;
 }
 
-

Used by 1 2

+

Used by 1 2

As with instructions, I will show you how to implement a single trap routine and leave the rest to you.

@@ -878,7 +878,7 @@

PUTS

-TRAP PUTS +TRAP PUTS
{
     /* one char per word */
     uint16_t* c = memory + reg[R_R0];
@@ -890,12 +890,12 @@ 

PUTS

fflush(stdout); }
-

Used by 1

+

Used by 1

That’s all for this routine. The trap routines are pretty straightforward if you are familiar with C. Go back to the specification and implement the others now. As with the instructions, the full code can be found at the end of the tutorial.

-

8. Trap routine cheat sheet

+

Trap routine cheat sheet

This section contains the full implementations of the remaining trap routines.

@@ -904,30 +904,30 @@

8. Trap routine cheat sheet

-TRAP GETC +TRAP GETC
/* read a single ASCII char */
 reg[R_R0] = (uint16_t)getchar();
 update_flags(R_R0);
 
-

Used by 1

+

Used by 1

Output Character

-TRAP OUT +TRAP OUT
putc((char)reg[R_R0], stdout);
 fflush(stdout);
 
-

Used by 1

+

Used by 1

Prompt for Input Character

-TRAP IN +TRAP IN
{
     printf("Enter a character: ");
     char c = getchar();
@@ -937,14 +937,14 @@ 

8. Trap routine cheat sheet

update_flags(R_R0); }
-

Used by 1

+

Used by 1

Output String

-TRAP PUTSP +TRAP PUTSP
{
     /* one char per byte (two bytes per word)
        here we need to swap back to
@@ -961,24 +961,24 @@ 

8. Trap routine cheat sheet

fflush(stdout); }
-

Used by 1

+

Used by 1

Halt Program

-TRAP HALT +TRAP HALT
puts("HALT");
 fflush(stdout);
 running = 0;
 
-

Used by 1

+

Used by 1

-

9. Loading programs

+

Loading programs

We have mentioned a lot about loading and executing instructions from memory, but how do instructions get into memory in the first place? When an assembly program is converted to machine code, the result is a file containing an array of instructions and data. This can be loaded by just copying the contents right into an address in memory.

@@ -989,7 +989,7 @@

9. Loading programs

-Read Image File +Read Image File
void read_image_file(FILE* file)
 {
     /* the origin tells us where in memory to place the image */
@@ -1010,20 +1010,20 @@ 

9. Loading programs

} }
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Notice that swap16 is called on each loaded value. LC-3 programs are big-endian, but most modern computers are little-endian. So, we need to swap each uint16 that is loaded. (If you happen to be using an obscure computer, like an old PPC Mac, then do not swap.)

-Swap +Swap
uint16_t swap16(uint16_t x)
 {
     return (x << 8) | (x >> 8);
 }
 
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Note: Endianness refers to how bytes of an integer are interpreted. In little-endian, the first byte is the least significant digit, and in big-endian, it is reversed. As far as I know, the decision is mostly arbitrary. Different companies made different decisions, so now we are left with varying implementations. You do not need to know anything else about endianness for this project.

@@ -1032,7 +1032,7 @@

9. Loading programs

-Read Image +Read Image
int read_image(const char* image_path)
 {
     FILE* file = fopen(image_path, "rb");
@@ -1042,12 +1042,12 @@ 

9. Loading programs

return 1; }
-

Used by 1 2 3 4

+

Used by 1 2 3 4

-

10. Memory mapped registers

+

Memory mapped registers

Some special registers are not accessible from the normal register table. Instead, a special address is reserved for them in memory. To read and write to these registers, you just read and write to their memory location. These are called memory mapped registers. They are commonly used to interact with special hardware devices.

@@ -1058,21 +1058,21 @@

10. Memory mapped registers

-Memory Mapped Registers +Memory Mapped Registers
enum
 {
     MR_KBSR = 0xFE00, /* keyboard status */
     MR_KBDR = 0xFE02  /* keyboard data */
 };
 
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Memory mapped registers make memory access a bit more complicated. We can’t read and write to the memory array directly, but must instead call setter and getter functions. When memory is read from KBSR, the getter will check the keyboard and update both memory locations.

-Memory Access +Memory Access
void mem_write(uint16_t address, uint16_t val)
 {
     memory[address] = val;
@@ -1095,12 +1095,12 @@ 

10. Memory mapped registers

return memory[address]; }
-

Used by 1 2 3 4

+

Used by 1 2 3 4

That completes the last component of the VM! Provided that you implemented the rest of the trap routines and instructions, you are almost ready to try it out!

-

11. Platform specifics

+

Platform specifics

This section contains some tedious details that are needed to access the keyboard and behave nicely. @@ -1113,7 +1113,7 @@

Linux/macOS/UNIX

-Input Buffering +Input Buffering
struct termios original_tio;
 
 void disable_input_buffering()
@@ -1141,14 +1141,14 @@ 

Linux/macOS/UNIX

return select(1, &readfds, NULL, NULL, &timeout) != 0; }
-

Used by 1 2

+

Used by 1 2

-Includes +Includes
#include <stdio.h>
 #include <stdint.h>
 #include <signal.h>
@@ -1161,7 +1161,7 @@ 

Linux/macOS/UNIX

#include <sys/termios.h> #include <sys/mman.h>
-

Used by 1 2

+

Used by 1 2

Windows

@@ -1170,7 +1170,7 @@

Windows

-Input Buffering Windows +Input Buffering Windows
HANDLE hStdin = INVALID_HANDLE_VALUE;
 DWORD fdwMode, fdwOldMode;
 
@@ -1203,7 +1203,7 @@ 

Windows

-Windows Includes +Windows Includes
#include <stdio.h>
 #include <stdint.h>
 #include <signal.h>
@@ -1223,11 +1223,11 @@ 

All platforms

-Setup +Setup
signal(SIGINT, handle_interrupt);
 disable_input_buffering();
 
-

Used by 1 2 3

+

Used by 1 2 3

When the program is interrupted, we want to restore the terminal settings back to normal. @@ -1235,17 +1235,17 @@

All platforms

-Shutdown +Shutdown
restore_input_buffering();
 
-

Used by 1 2 3

+

Used by 1 2 3

Settings should also be restored if we receive a signal to end the program.

-Handle Interrupt +Handle Interrupt
void handle_interrupt(int signal)
 {
     restore_input_buffering();
@@ -1253,37 +1253,37 @@ 

All platforms

exit(-2); }
-

Used by 1 2 3 4

+

Used by 1 2 3 4

Everything we have written so far should have been added to the C file in the following order:

-

12. Running the VM

+

Running the VM

You can now build and run the LC-3 VM!

@@ -1322,7 +1322,7 @@

Debugging

If the program doesn’t work correctly, it is likely because you programmed an instruction incorrectly. This can be tricky to debug. I recommend reading through the assembly source code of an LC-3 program while simultaneously using a debugger to step through the VM instructions one at a time. As you read the assembly, make sure the VM goes to the instruction that you expect it to. If a discrepancy occurs, you will then know which instruction caused the issue. Reread its specification and double check your code.

-

13. Alternate C++ technique

+

Alternate C++ technique

Here is an advanced way of organizing instructions that makes the code a whole lot shorter. @@ -1343,7 +1343,7 @@

13. Alternate C++ technique

-Instruction C++ +Instruction C++
template <unsigned op>
 void ins(uint16_t instr)
 {
@@ -1429,7 +1429,7 @@ 

13. Alternate C++ technique

if (0x0080 & opbit) { mem_write(base_plus_off, reg[r0]); } // STR if (0x8000 & opbit) // TRAP { - @{TRAP} + @{TRAP} } //if (0x0100 & opbit) { } // RTI if (0x4666 & opbit) { update_flags(r0); } @@ -1442,7 +1442,7 @@

13. Alternate C++ technique

-Op Table +Op Table
static void (*op_table[16])(uint16_t) = {
     ins<0>, ins<1>, ins<2>, ins<3>,
     ins<4>, ins<5>, ins<6>, ins<7>,
@@ -1462,7 +1462,7 @@ 

13. Alternate C++ technique

The rest of the C++ version uses the code we already wrote! The full source is here: unix, windows.

-

14. Contributions

+

Contributions

atul-g has contributed a handy reference card