Atindra Jha (atj10), Rupert Lu (rupertlu), Rohan Sanda (rsanda)
EE 180: Digital Systems Architecture (Winter 2024)
24 January 2024
MIPS (Microprocessor without Interlocked Pipeline Stages) is a well-known instruction set architecture (ISA) developed by John Hennessy. The MIPS ISA is characterized by its simplicity, as it specifies just three kinds of 32-bit instructions. In this lab, our goal was to translate a C implementation of Radixsort – a popular non-comparative sorting algorithm – into MIPS assembly code following the MIPS ISA. We approached this process from the bottom-up: starting first with implementing smaller helper functions as leaf procedures before tackling more complex parent functions. We ultimately produced a robust implementation of Radixsort in MIPS assembly that can efficiently sort up to about 5500 nonnegative, unsigned 32-bit integers. In the following sections, we outline our methods, issues, testing procedures, and results.
We started by implementing the leaf procedures: copy_array
and find_exp
(arrcpy
and find_exp
in radixsort.s
respectively). For arrcpy
, we did not allocate space on the stack to store any registers (and load/restore them later) as there are no function calls in the body of copy_array
. We simply initialized i
to zero using the move
instruction and then checked if the termination condition (of the for loop) was met using the bge
(branch if greater than or equal to) statement to compare i
(stored in register $t0
) and n
(supplied as an argument to copy_array
and stored in $a2
). We set up the code such that it would branch to the part where it jumps back to its return address (the end of the function) in case the termination condition was met (i.e. i >= n
). Otherwise, it would continue executing the lines of instructions in sequential order and copy the element at address src + (4 x i)
to the address dst + (4 x i)
in the destination array. The base addresses of the src and dst arrays are arguments to the function, and are stored in registers $a0
and $a1
respectively. Following the MIPS calling convention, we did not use “s” registers in this procedure as they are callee-saved registers and copy_array
is not a caller and thus does not have a callee.
For find_exp
, we did not allocate space on the stack or use “s” registers for the same reasons highlighted above. We simply used the load word lw
instruction to load the array[0]
element into the $t0
register (which represents the “largest” variable). The base address of “array” is supplied to this function as its first argument and can thus be found in register $a0
. Our for-loop initialization was similar to what we did for copy_array
: we used the move
instruction to initialize i
(represented by the $t1
register) to 0 using the $zero
register. We used a branch greater than instruction (bge
) to check if the termination condition was met (i.e. i >= n
) and if so, branched to the end of the for loop at a point right before the while loop. The body of the for loop was fairly trivial except the part where we used another branch greater than (bge
) instruction to check if largest <= array[i]
in order to update the value of largest ($t0
). The implementation of while was similar to that of the for loop in that we used a branch instruction (blt
) to check if the termination condition was met. An interesting operation that was not present in copy_array
or the first half of find_exp
was dividing largest by RADIX to get the quotient. We used the divu
instruction to device register $t0
by $t1
(we had loaded the constant 10 into the $t1
register using the li
instruction to use it as RADIX). We then use the mflo
instruction to move the quotient to $t0
(updating the value of largest). We learnt that the quotient of a division can be retrieved using mflo
and the remainder, using mfhi
. We used the same jr $ra
instruction at the end of the function to jump back to the caller function.
The implementation of radsort
was more difficult as this function was not a leaf procedure – in fact, it was recursive. Thus, we began by allocating space on the stack to store the current "s" registers (relevant for recursive calls to radsort). Following the for-loop optimizations discussed in class, we set up our logic for handling the base case by first having a jump instruction to a condition flag (where we checked if n < 2 || exp == 0
). Next, we implemented the malloc()
calls similar to how they were translated in the main()
flag in the provided starter code. After initializing the contents of the children_len
array to be zero in memory, we translated the bucket assignment for-loop. This was a more interesting translation as we learned about the mflo
and mfhi
registers that we used to implement division and modulus, respectively. We also implemented the nested memory addressing (children[sort_index][children_len[sort_index]] = array[i]
) by adding four times the value stored at children[sort_index]
to four times the value stored at children_len[sort_index]
. A final interesting implementation point comes from when we perform caller-saving and restoration of the "a" and "t" registers before and after recursing to radsort
, respectively.
We soon realized that we did not quite understand the proper conventions to make a procedure call, especially when we were trying to manage our caller/callee-saved registers while simultaneously dealing with the complexities of implementing recursion. This became very apparent when we decided to run our code for the first time, resulting in thousands of “Exception 7 [Bad Data Address]”. The error turned out to be due to us attempting to perform caller saving after a branch condition, meaning that our “loads” and “saves” were not aligned correctly, and this ended up propagating through our recursive calls, overloading us with the same error messages.
Another confusing aspect of dealing with both recursion and caller/callee-saved registers at the same time was deciding when and where to save our registers. For example, in our main radix sort function, we had an $s6
register before the recursive call that we did not expect to use again afterwards. However, we ended up adding one line that uses the value of $s6
after the recursive call, overlooking the fact that we had originally determined the $s6
as not needing to be saved by the callee. This did not raise any exceptions, however, our actual output ended up being incorrectly sorted and contained repeated numbers. Walking through our code again multiple times and ensuring that all our variables contained the proper values enabled us to identify this critical error.
Our final critical error was that within one of our loops, we expected a register’s value to change on every iteration. However, within that loop, we had a branch condition, and our register-updating only occurred after one specific condition. As a result, we were not properly updating the registers that we were expecting to be updated in the rest of the code, leading to a one-line error that again outputted the wrong results. The method to fix this error was similar to the previous-–go through the code again and ensure that all variables/registers contain the proper values we intend for them to have.
Having made all these fixes, our MIPS assembly implementation of radix sort now successfully sorts integers from a value of 0 all the way up to the maximum unsigned 4-byte integer value of 4,294,967,295. Our code is also able to sort up to around ~5000 numbers before it runs out of memory.
In closing, we translate a C implementation of the Radixsort algorithm into assembly following the MIPS ISA. We find that our implementation, which consists of approximately 200 added lines to the starter code, can efficiently sort thousands of nonnegative 32-bit unsigned integers. We executed a robust testing procedure, developing our own random generator scripts to create large test cases. Using the provided comparator script, we find no difference in the output between our MIPS assembly and the C implementation of Radixsort.
We encountered a variety of bugs in our development process. These included performing caller register saving after a branch condition, issues with caller versus callee register saving, and additional typographical errors. After this lab, we are far more confident in our understanding of the MIPS ISA.