Skip to content

Latest commit

 

History

History
444 lines (362 loc) · 19 KB

retrospective.org

File metadata and controls

444 lines (362 loc) · 19 KB

I recently participated in the 4th installment of langjam, a competition that makes its competitors create a programming language over a period of 48 hours. If that sounds like less time than it should take to make a good programming language, you’d be right, but I reckon that’s not really the point of the competition.

Anyway, I learned that this competition existed about half-way through the first day. I decided to look up the prompt which was “`the sound(ness) of one hand typing“`. I thought of the old adage which relates to the clapping of a single hand, and got to thinking about ways to design a programming language that was missing half of it, in such a way as to preclude its use.

The result is write-only-wanguage, or ‘wow’. Available on github here.

Seizing the means of non-interaction.

If you want to remove half of “read/write”, you can choose to either have a read-only language, which sounds boring, or a write-only language. What could a write-only language be? Someone must have made one before, but I didn’t bother to check. Can I design a language so absolutely shit that my write-only language is the world’s shittest write-only language?

I started writing a parser that would take in data files, and have no further input. But then, isn’t that what all compilers do? No, a write-only language would have to have no means of input, no reading of files, or loaded values. Let’s push this further and isolate it from the outside world entirely. It has no input, no output. That’s the rules.

We’re going to allow internal reads and comparisons internally, despite being technically reading, because I decided it was OK, and I’m making up the rules to end up with the most entertaining result.

How do we design a language that can’t be interacted with? It can’t be compiled, since the compiled code is an output, and it can’t really be interpreted since the interpreter needs an input.

But do we really need an input for an interpreted language?

If we take some inspiration from virtual machines, we can have a state machine that modifies its own state starting from a hardcoded value and proceeding from there. As such, the user does not pass in the input and our interpreter still runs. Magnificent!

A crappy state machine: designed for jankiness

We’re already designing a language with no means of interaction. How could we make this worse? Let’s go to a throwback of old. Let’s say we have a really crappy computer as the basis of our state machine.

Our state machine has an accumulator (ACC, 64-bit), an instruction register (INS, 16-bit). The choice of a 16-bit non-extendable instruction set really increases the jank. It has 256 unsigned 64-bit integers as memory, which we will call “heap”. That’s it. No stack pointer. No counter. No flags. You have to manually change the instruction register if you want to make it do things.

I’m not certain of this, but I think that not even retrocomputers would be this bad. They had to actually sell units, after all, so we’re trailblazing here. This is groundbreaking fundamental research!

Here is what most of the logic flow looks like.

// Let us define our static registers
static ACC: AtomicU64 = AtomicU64::new(0);
static INS: AtomicU16 = AtomicU16::new(0);

fn main() {
    eval_loop();
}

fn eval_loop() {
    // Here we initialize our main source of memory
    let mut heap: [u64; 256] = [0; 256];
    // Main loop
    loop {
	  let ins = INS.load(Ordering::SeqCst);
	  INS.store(0, Ordering::SeqCst);

	  // Function dispatch on instruction
	  let cont: Continuation = match ins {
	      0xFFFF => terminate(),
	      0x0000 => processor_idle_sleep(),
	      // {... more commands ...}
	      _ => break,
	  };
	  match cont {
	      Continuation::Stop => break,
	      Continuation::Continue => continue,
	  };
    }
}

The program consists of a single loop that does the following in order:

  • Fetches INS from the atomic static variable, then sets INS = 0;
  • Takes INS, matches it to a function, and then calls the function.
  • The function returns a Continuation enum, which is either Stop or Continue
  • If stop, then break the loop and terminate the program.

ACC starts off at 0, as does INS. The 0 instruction is a no-op. As such the program will loop forever and you have to kill the program to terminate it. Great language!

Functions

In order to simplify the previous code, I have omitted the functions. The whole list is available on github. Mainly, the functionality is grouped into 4 categories:

Instruction OPCode RangeFunctionality Group
0x0000, 0xFFFFSleeps for 10ms, terminates program respectively
0xA000…0xA8FFOperations that modify ACC
0xAA00…0xAB00Operations that write ACC to the heap
0xBA00…0xBCFFOperations that write to INS
0xC000…0xC3FFOperations that write to INS, depending on the value of ACC

INS is a single value, and does not allow for multiple instruction calls to be queued. As such, it would be seemingly impossible for any meaningful work to be performed.

For instance, let’s say we want to zero the ACC. One way this could happen is if INS happened to equal 0xBA04. This is the op-code for “load the value at heap memory address 04 into INS”. At address 0x04, if the value 0xA000 was written, it would have the effect of loading A000 into INS, which would then zero the ACC.

Let’s work through the control flow here.

StepPlace in codeheap[0x04]ACCINS
0Start of loop0xA000any0xBA04
1Dispatch -> set_ins()0xA000any0
2After set_ins()0xA000any0xA000
3Start of loop0xA000any0xA000
4Dispatch -> zero_acc()0xA000any0
5After zero_acc()0xA00000
6(At this point, it loops forever)0xA00000

We need a second instruction that tells the machine where the next instruction is located.

To remedy this, I have invented what is probably the pinnacle of this project.

The jammer

What if we had a friend that would just jam another instruction into INS at step 4?

Enter modern multithreaded programming. We detach a thread whose only purpose is to jam another instruction into INS as soon as possible. This basically keeps the next instruction in memory somewhere, and could be replaced by a queue, but I think this mechanism is fun. Plus, it’s definitely a write-only mechanism.

fn deferred_jam_instruction(ins: u16) {
    std::thread::spawn( move || {
	let mut cycles = 10; // Lol deadlock prevention
	while INS.load(Ordering::SeqCst) != 0 && cycles > 0 {
	    cycles -= 1; 
	    // Wait for INS = 0
	    thread::sleep(time::Duration::from_millis(2));
	}
	INS.store(ins, Ordering::SeqCst);
    });
}

The op-code BBXX, loads the instruction at (XX+1) into memory, and then creates a jammer to load BB(XX + 2) into INS if INS == 0. Beautiful. If you are wondering whether this causes problems, or has race-condition issues, the answer is yes.

We can now set ACC=2 and do the next instruction by simply having INS = BB04, and having the following values in memory:

AddressValueInstruction functionality
0x040xA000Zero ACC
0x050xBB06Set INS to value of 0x06 and start jammer for 0x07
0x060xA001Incr ACC
0x070xA001Incr ACC

We can chain jammers to keep the code moving

AddressValueInstruction functionality
0x040xA000Zero ACC
0x050xBB06Set INS to value of 0x06 and start jammer for 0x07
0x060xA001Incr ACC
0x070xBB08Set INS to value of 0x06 and start jammer for 0x07
0x080xA001Incr ACC
0x090xAA04Store ACC to 0x04

Let’s see this in action.

StepPlace in codeheap[0x04]ACCINS
0Start of loop0xA000any0xBB04
1Dispatch -> set_ins_and_jam()0xA000any0
2After set_ins_and_jam()0xA000any0xA000
3Dispatch -> zero_acc()0xA000any0xBB06
4After zero_acc()0xA00000xBB06
5Dispatch -> set_ins_and_jam()0xA00000
6After set_ins_and_jam0xA00000xA001
7(Prior to dispatch)0xA00000
8Dispatch -> incr_acc()0xA00000xBB08 (jammed in)
9After incr_acc()0xA00010xBB08
10Dispatch -> set_ins_and_jam()0xA00010
11After set_ins_and_jam()0xA00010xA001
12Dispatch -> incr_acc()0xA00010
13After incr_acc()0xA00020xAA04 (jammed in)
14Dispatch -> write_acc()0xA00020
15After write_acc()220
(At this point, it loops forever)220

Fantastic! The jammer is a very nice and not at all problematic way to do what could be done by a simple data structure. But this still doesn’t solve the main problem: how is this scenario possible if all the memory values, including ACC and INS, are set to 0?

Interacting with the uninteractive machine

So far, I have withheld one critical piece of information, which is that the program runs on a general purpose computer on which we have access to memory addresses. While this is not surprising, it does allow us to do one neat trick.

The trick is that we can write and read to values in memory. We can also interrupt the control flow. In fact, most programmers have done this at some point in their life, through a tool that exploits this same flaw; the debugger.

Hooking up the debugger and writing a program

We are going to be using rust-gdb for this example, though other debuggers would certainly work.

rust-gdb target/debug/write-only

For this example, let us go through how we would run a predetermined program, then we’ll go over what the program does.

We are going to set a couple breakpoints. The first one is intended to interrupt before the assignment of the value of INT to the temporary variable int. The second one is intended to intercept calls to write_acc so that we can change the value of ACC and thus write whatever we want into memory. The third catches the program when it is terminated, allowing us to read the value of ACC.

Let’s set-up the breakpoints, run the program and continue until the first breakpoint.

b 23
b write_acc
b terminate
r
c

At the first breakpoint, we can now write our instruction to INS = 0xAB00, and continue on until we reach the second breakpoint.

set INS.v.value = 0xAB00
c

This will hit the dispatch table and end up running the function write_acc_all.

  fn write_acc_all(heap:&mut [u64; 256]) -> Continuation {
    for index in 0..0xFF {
	write_acc(heap, index);
    }
    Continuation::Continue
}

For every iteration of the loop, we have one call to write_acc. This is where we placed our second breakpoint.

  fn write_acc(heap:&mut [u64; 256], ins: u16) -> Continuation {
    let index: usize = (ins & 0xFF).into();
    heap[index] = ACC.load(Ordering::SeqCst);
    Continuation::Continue
}

If we set ACC before every call, it will load whatever value we want into memory.

set ACC.v.value = 0
c

This will continue until the next iteration of write_acc_all. We can proceed to mass assign our instructions and variables.

set ACC.v.value = 1
c
set ACC.v.value = 1000
c
set ACC.v.value = 0xBB04
c
set ACC.v.value = 0xA000
c
set ACC.v.value = 0xBB06
c
set ACC.v.value = 0xA100
c
set ACC.v.value = 0xBB08
c
set ACC.v.value = 0xA101
c
set ACC.v.value = 0xBB0A
c
set ACC.v.value = 0xAA00
c
set ACC.v.value = 0xBB0C
c
set ACC.v.value = 0xA000
c
set ACC.v.value = 0xBB0E
c
set ACC.v.value = 0xA101
c
set ACC.v.value = 0xBB10
c
set ACC.v.value = 0xA001
c
set ACC.v.value = 0xBB12
c
set ACC.v.value = 0xA001
c
set ACC.v.value = 0xBB14
c
set ACC.v.value = 0xAA01
c
set ACC.v.value = 0xC316
c
set ACC.v.value = 100
c
set ACC.v.value = 0xBA05
c
set ACC.v.value = 0xBB19
c
set ACC.v.value = 0xA000
c
set ACC.v.value = 0xBB1B
c
set ACC.v.value = 0xA100
c
set ACC.v.value = 0xFFFF
c

We have many more loops, but we don’t have anything to write, so let’s clear the second breakpoint and continue.

 cl write_acc
c 

We will hit the first breakpoint again. There is nothing left to do but launch our program.

  set INS.v.value = 0xBA03
cl 23
c

This will run until we hit termination. The value in ACC is the one we care about so

print ACC.v.value

And we are done.

Um, what does that program do?

Well, it’s supposed to sum up all the odd numbers below 100, but I found that the limit is 255 before it enters an infinite loop. It also does not output the correct value, and I haven’t had time to figure out why it didn’t work during the 48-hour allocated period.

I think this deserves extra points. The language is so utterly shite that even its creator couldn’t produce an example that runs properly.

The version hosted on my personal github should have a version that works, at some point in the future.

Here’s what the memory, and each address does

Addr-hexValueEffect
000Storage, summation variable “sum”
011Increment “inc”
021000–unused, kept to not mess up line numbers-
030xBB04Set INS to the value at 0x04, and jam at 0x05
040xA000Zero ACC
050xBB06
060xA100Add “sum” to ACC
070xBB08
080xA101Add “inc” to ACC
090xBB0A
0A0xAA00Set “sum” to ACC
0B0xBB0C
0C0xA000Set ACC to 0
0D0xBB0E
0E0xA101Add “inc” to ACC
0F0xBB10
100xA001Add 1 to ACC
110xBB12
120xA001Add 1 to ACC
130xBB14
140xAA01Set “inc” to ACC
150xC316Compare ACC to the value in 16, if false, set INS to value of 17, if true set INS to val of 18
16100Limit for increment,
170xBA05Set INS to value of 05, which is 0xBB06
180xBB19
190xA000Zero ACC
1A0xBB1B
1B0xA100Add “sum” to ACC
1C0xFFFFTerminate

Or, if you write it in pseudocode

s = 0
inc = 1
while True:
  # Doing "sum = sum + inc"
  acc = 0
  acc += s
  acc += inc
  sum = acc

  # Doing "inc = inc + 2"
  acc = 0
  acc += inc
  acc += 1
  acc += 1
  inc = acc

  if inc > 100:
    break
acc = 0
acc = s

This gives the right result in Python.

¯\_(ツ)_/¯

Work left to do

There are some exploits to the language, mainly that I don’t want people to write to the heap directly. It’s too convenient, making the language half-usable. As such, there needs to be some indirection nonsense to make this more inconvenient than the method I presented before. I have not yet conceptualized what I intend for this.