-
Notifications
You must be signed in to change notification settings - Fork 293
Basilisk II Core Emulation Analysis
"Emulation core" refers to the logic of translating M68k CPU guest instructions into non-M68k CPU host instructions (Intel X86, AMD64, ARM and PPC). Besides CPU core emulation, we have a separate page to describe 68k Macintosh peripheral hardware emulation such as timer, ethernet, audio and etc. The majority of the following emulation analysis is based on the study of an AMD64 Linux host, but it should apply to different host architectures and operating systems.
The facts described below are purely based on:
If it is on the Internet it must be true.
Basilisk II CPU emulation was first started by Christian Bauer. From the initial source code, it has an original root from another M68k Amiga emulation project called UAE. By performing a diff between early commit set 2bebaceabc7646d in macemu git repo and UAE v0.8.10 source code, we find that the files build68k.c, cpuopti.c, gencpu.c and table68k are nearly identical to those in UAE. For further reading on UAE, you can view the UAE People Section in the WinUAE documentation.
Based on the commit history, Gwénolé Beauchesne is the key contributor to Basilisk II CPU emulation. He added JIT translation (a.k.a dynamic binary translation) to speed up emulation. TODO -- Add more when read JIT code.
For non-M68k CPU emulation, the source code is under src/uae_cpu folder.'
TODO -- Add overview of Glue/Adapter, UAE CPU, FPU and JIT.
There are two different perspectives in terms of memory addressing in emulation.
The first one is from the host OS point of view. An emulation program such as Basilisk II runs as an application at ring 3 user space. The majority of modern host CPUs, such as Intel x86, AMD 64, PPC and ARM nowadays, contain an MMU, which may provide segmentation and paging. The majority of modern host OSes such as Linux, Mac OS X and Windows use virtual addressing, instead of direct physical addressing of memory.
For 32 bit CPUs, the CPU in theory can access up to 2^32 bytes (4GB) virtual memory. A 64 bit CPU can access a larger memory space than you can imagine. However, it doesn’t mean that applications can use an arbitrary virtual address. This usually depends on CPU architecture and host OS implementation. For example, 32 bit Linux by default will put aside the lower 3GB for user space and the upper 1GB for kernel space [1].
The second perspective is from the guest Macintosh OS point of view. In theory, the guest OS doesn’t know if it is running under a physical M68k CPU or an emulated CPU provided by BII. Therefore, BII needs to provide memory address mapping between the guest OS and BII's user space memory in the host OS when executing translated instructions.
According to the Wikipedia page on M68k series CPUs [2], only 68030 or above M68k series CPU have a built-in Paged MMU. In addition, Apple added virtual memory features to System 7. TODO -- Investigate if BII emulates the PMMU. Try to enable virtual memory in memory manager under control panel.
In terms of the address mapping provided by Basilisk II, there are three different types: direct addressing, real addressing and virtual addressing. By default, the GNU automake tools determine the proper addressing mapping strategy for you. If you know better than the automatic detection, you can override it by passing the enable-addressing
option to the ./configure
script. It accepts the options direct
, real
and banks
. (Note that the banks
option refers to virtual addressing). You can also see the addressing mode after running ./configure
:
...
Assembly optimizations ................. : x86-64
Addressing mode ........................ : direct
Bad memory access recovery type ........ : siginfo
...
It is interesting to analyze the configure.ac
script to figure out how the optimal addressing strategy is automatically determined. The selection logic depends on several tests of the host OS and platform.
- Check if OS supports VOSF a.k.a Video on Segmentation Fault (Regarding to the technical details of VOSF, please refer to [3]). Whether it supports VOSF or not is determined by another condition that whether OS supports segmentation fault signal handler. This relies on several compilation test in src/CrossPlatform folder controlled by macro. As long as any of test below is passed under Linux/Mac OS X, it will set variable CAN_VOSF as ‘yes’.
/*Test 1: Check if OS supports extended signal handlers via asm*/
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#define HAVE_ASM_UCONTEXT 1
#define HAVE_SIGINFO_T 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
/*Test 2: Check if OS supports extended signal handlers*/
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#define HAVE_SIGINFO_T 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
/*Test 3: Check if OS supports the hack*/
#define HAVE_SIGCONTEXT_SUBTERFUGE 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
- Check if the OS allows page zero mapping in user space or another related page zero hack. Due to NULL pointer deference security concern, by default the modern Linux kernel has disabled page zero mapping. But Mac OS X still supports this.
In the end, the configure script uses the result of the tests:
- If native M68k CPU, it uses real addressing.
- If page zero mapping is supported and the host is able to do VOSF, it uses real addressing.
- If page zero mapping is not supported, but it is able to do VOSF, it uses direct addressing.
- Otherwise, it uses memory banks a.k.a virtual addressing.
See parts of configure.ac that determines addressing mode.
Common data structure defined in src/uae_cpu/cpu_emulation.h
// RAM and ROM pointers
uint32 RAMBaseMac = 0; // RAM base (Mac address space) gb-- initializer is important
uint8 *RAMBaseHost; // RAM base (host address space)
uint32 RAMSize; // Size of RAM
uint32 ROMBaseMac; // ROM base (Mac address space)
uint8 *ROMBaseHost; // ROM base (host address space)
uint32 ROMSize; // Size of ROM
Common utilities functions defined in src/uae_cpu/cpu_emulation.h
static inline uint32 ReadMacInt32(uint32 addr) {return get_long(addr);}
static inline uint32 ReadMacInt16(uint32 addr) {return get_word(addr);}
static inline uint32 ReadMacInt8(uint32 addr) {return get_byte(addr);}
static inline void WriteMacInt32(uint32 addr, uint32 l) {put_long(addr, l);}
static inline void WriteMacInt16(uint32 addr, uint32 w) {put_word(addr, w);}
static inline void WriteMacInt8(uint32 addr, uint32 b) {put_byte(addr, b);}
static inline uint8 *Mac2HostAddr(uint32 addr) {return get_real_address(addr);}
static inline uint32 Host2MacAddr(uint8 *addr) {return get_virtual_address(addr);}
static inline void *Mac_memset(uint32 addr, int c, size_t n) {return memset(Mac2HostAddr(addr), c, n);}
static inline void *Mac2Host_memcpy(void *dest, uint32 src, size_t n) {return memcpy(dest, Mac2HostAddr(src), n);}
static inline void *Host2Mac_memcpy(uint32 dest, const void *src, size_t n) {return memcpy(Mac2HostAddr(dest), src, n);}
static inline void *Mac2Mac_memcpy(uint32 dest, uint32 src, size_t n) {return memcpy(Mac2HostAddr(dest), Mac2HostAddr(src), n);}
This allocates memory in host OS for guest OS's RAM and ROM.
TODO -- Change the following into sequence diagram later.
vm_acuqire_mac(RAMSize + 0x100000) -- why ROM size only 1MB 0x100000?
=>
vm_acquire(size, VM_MAP_DEFAULT | VM_MAP_32BIT) -- from CrossPlatform VM allocator wrapper.
=>
vm_acquire calls on OS specific user space memory allocation function and setup read/write flag.
For example, in Linux vm_acquire function calls mmap [4] to map /dev/zero file to the starting address 0x10000000 (NOTE: virtual address of host OS starts at 256MB) with specific size. This creates chunks of pre-initialized memory in host OS. Then vm_acquire calls mprotect to specify allocated virtual memory with with read/write permission.
The mapping of direct addressing between host and guest is simple and efficient.
Host: RAM starts at whatever return address from vm_acquire_mac function. Let’s denote it as RAMBaseHost. ROM starts at RAMBaseHost + RAMSize.
Guest: RAM starts at 0. ROM starts at RAMSIZE.
As you can easily tell, the mapping between host and guest is to apply a fixed difference -- RAMBaseHost. In my test, I found that Linux 64 and Linux ARMv7 uses direct addressing. You can run pmap command on BasiliskII process to confirm code analysis with actual memory usage at runtime. In my BII’s preference, 1GB RAM is specified. According to pmap, 1GB memory is allocated with read/write flag starting at virtual address 0x10000000.
[Ricky@gtx Unix]$ cat ~/.basilisk_ii_prefs | grep ramsize
ramsize 1073741824
[Ricky@gtx Unix]$ ps -A | grep BasiliskII
27151 pts/1 00:00:01 BasiliskII
[Ricky@gtx Unix]$ pmap 27151
27151: ./BasiliskII
0000000010000000 1048640K rw--- [ anon ]
0000000050010000 1896K r---- [ anon ]
0000000078000000 2160K rwx-- BasiliskII
000000007821c000 544K rwx-- [ anon ]
000000007a1ef000 2920K rw--- [ anon ]
One side note: as user space program, the text segment of BasiliskII is mapped above 0x70000000. The trick is done by linker script [5] defined in src/Unix/ldscripts folder. See -T
option in linking.
...
g++ -o BasiliskII -Wl,-T,ldscripts/linux-x86_64.ld obj/main.o obj/prefs.o obj/prefs_items.o obj/sys_unix.o obj/rom_patches.o …
...
Real addressing requires host CPU to access host memory from 0x0000 to 0x2000. As mentioned above, for security reason modern Linux has disabled accessing page zero
This allocates memory in host OS for guest OS's RAM and ROM in contiguous location:
TODO -- Change the following into sequence diagram later.
vm_acquire_mac_fixed(0, RAMSize + 0x100000)
=>
vm_acquire_fixed(addr, size, VM_MAP_DEFAULT | VM_MAP_32BIT)
=>
vm_acquire_fixed calls on OS specific user space memory allocation function and setup read/write flag.
Compared to direct addressing, real addressing is very straightforward with zero overhead in terms of address translation. The address in guest CPU maps to the same address in the host CPU. There is no address mapping is needed.
Native M68k CPU can use real addressing. But just for fun, let’s do an experiment to trick Basilisk II into real addressing mode under modern Linux host.
First, we explicitly set vm.mmap_min_addr
to 0 by sysctl in Linux so that we can use paging zero in user space.
[root@gtx vm]# echo 0 > /proc/sys/vm/mmap_min_addr
Secondly, run configure
./configure --enable-sdl-video --enable-sdl-audio --disable-jit-compiler --with-x --with-gtk --enable-addressing=real
However, it shows that the addressing uses memory_banks, instead. We already know that Linux can use VOSF. So it must be test that set $ac_cv_can_map_lm
variable failed. I extract the test program from configure.ac
Run the following to compile and execute the test program. I got an illegal instruction error, instead of segfault. Here is test program source code and below is compilation and test result:
g++ -o conftest -g -O2 -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT conftest.cpp -lm -lrt -lrt -lSDL -lpthread
./conftest
Illegal instruction (core dumped)
Here is the more interesting finding. I run gdb and found that it failed at lm[0] = ‘z’
. After disassembling the binary, I found that it is GCC optimization's fault.
.text:00000000004006C0 call _Z16vm_acquire_fixedPvmi ; vm_acquire_fixed(void *,ulong,int)
.text:00000000004006C5 test eax, eax
.text:00000000004006C7 js short loc_4006D3
.text:00000000004006C9 mov byte ptr ds:0, 0
.text:00000000004006D1 ud2
GCC thought that since lm
is a NULL pointer, there is no point to do assignment. That’s why you see ud2
undefined instruction, which raise an invalid opcode exception.
If you remove -O2
optimization flag in compilation, you will get a successful run.
g++ -o conftest -g -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT conftest.cpp -lm -lrt -lrt -lSDL -lpthread
./conftest
Here is the assembly from non-optimization binary:
.text:0000000000400A8F loc_400A8F: ; CODE XREF: main+1Aj
.text:0000000000400A8F mov edx, 2
.text:0000000000400A94 mov esi, 2000h
.text:0000000000400A99 mov edi, 0
.text:0000000000400A9E call _Z16vm_acquire_fixedPvmi ; vm_acquire_fixed(void *,ulong,int)
.text:0000000000400AA3 shr eax, 1Fh
.text:0000000000400AA6 test al, al
.text:0000000000400AA8 jz short loc_400AB4
.text:0000000000400AAA mov edi, 1 ; status
.text:0000000000400AAF call _exit
.text:0000000000400AB4 ; ---------------------------------------------------------------------------
.text:0000000000400AB4
.text:0000000000400AB4 loc_400AB4: ; CODE XREF: main+3Fj
.text:0000000000400AB4 mov rax, [rbp+var_8]
.text:0000000000400AB8 mov byte ptr [rax], 7Ah ; Ricky comment - 7Ah is ‘z’ ASCII
.text:0000000000400ABB mov rax, [rbp+var_8]
Therefore, to workaround this you need to add compile flag as -O0
.
CFLAGS=-O0;CPPFLAGS=-O0 ./configure --enable-sdl-video --enable-sdl-audio --disable-jit-compiler --with-x --with-gtk --enable-addressing=real
…
Assembly optimizations ................. : x86-64
Addressing mode ........................ : real
Bad memory access recovery type ........ : siginfo
...
Build and run BasiliskII in real mode
[Ricky@gtx zero_mem]$ ps -A | grep BasiliskII
30607 pts/2 00:00:00 BasiliskII
[Ricky@gtx zero_mem]$ pmap 30607
30607: ./BasiliskII
0000000000000000 1048576K rw--- [ anon ]
0000000040425000 64K rw--- [ anon ]
0000000040435000 1896K r---- [ anon ]
Finally, BasiliskII can run in real addressing mode with a workaround.
TODO
TODO
TODO