-
Notifications
You must be signed in to change notification settings - Fork 293
Basilisk II Core Emulation Analysis
- Table of Contents
- History
- Source Code
- Addressing
- Static Analysis
- Dynamic Analysis
- Bibliography
"Emulation core" refers to the logic of translating M68k CPU guest instructions into non-M68k CPU host instructions (Intel X86, AMD64, ARM and PPC). Besides CPU core emulation, we have a separate page to describe 68k Macintosh peripheral hardware emulation such as timer, ethernet, audio and etc. The majority of the following emulation analysis is based on the study of an AMD64 Linux host, but it should apply to different host architectures and operating systems.
The facts described below are purely based on:
If it is on the Internet it must be true.
Basilisk II CPU emulation was first started by Christian Bauer. From the initial source code, it has an original root from another M68k Amiga emulation project called UAE. By performing a diff between early commit set 2bebaceabc7646d in macemu git repo and UAE v0.8.10 source code, we find that the files build68k.c, cpuopti.c, gencpu.c and table68k are nearly identical to those in UAE. For further reading on UAE, you can view the UAE People Section in the WinUAE documentation.
Based on the commit history, Gwénolé Beauchesne is the key contributor to Basilisk II CPU emulation. He added JIT translation (a.k.a dynamic binary translation) to speed up emulation. TODO -- Add more when read JIT code.
For non-M68k CPU emulation, the source code is under src/uae_cpu folder.'
TODO -- Add overview of Glue/Adapter, UAE CPU, FPU and JIT.
There are two different perspectives in terms of memory addressing in emulation.
The first one is from the host OS point of view. An emulation program such as Basilisk II runs as an application at ring 3 user space. The majority of modern host CPUs, such as Intel x86, AMD 64, PPC and ARM nowadays, contain an MMU, which may provide segmentation and paging. The majority of modern host OSes such as Linux, Mac OS X and Windows use virtual addressing, instead of direct physical addressing of memory.
For 32 bit CPUs, the CPU in theory can access up to 2^32 bytes (4GB) virtual memory. A 64 bit CPU can access a larger memory space than you can imagine. However, it doesn’t mean that applications can use an arbitrary virtual address. This usually depends on CPU architecture and host OS implementation. For example, 32 bit Linux by default will put aside the lower 3GB for user space and the upper 1GB for kernel space [1].
The second perspective is from the guest Macintosh OS point of view. In theory, the guest OS doesn’t know if it is running under a physical M68k CPU or an emulated CPU provided by BII. Therefore, BII needs to provide memory address mapping between the guest OS and BII's user space memory in the host OS when executing translated instructions.
According to the Wikipedia page on M68k series CPUs [2], only 68030 or above M68k series CPU have a built-in Paged MMU. In addition, Apple added virtual memory features to System 7. TODO -- Investigate if BII emulates the PMMU. Try to enable virtual memory in memory manager under control panel.
In terms of the address mapping provided by Basilisk II, there are three different types: direct addressing, real addressing and virtual addressing. By default, the GNU automake tools determine the proper addressing mapping strategy for you. If you know better than the automatic detection, you can override it by passing the enable-addressing
option to the ./configure
script. It accepts the options direct
, real
and banks
. (Note that the banks
option refers to virtual addressing). You can also see the addressing mode after running ./configure
:
...
Assembly optimizations ................. : x86-64
Addressing mode ........................ : direct
Bad memory access recovery type ........ : siginfo
...
It is interesting to analyze the configure.ac
script to figure out how the optimal addressing strategy is automatically determined. The selection logic depends on several tests of the host OS and platform.
- Check if OS supports VOSF a.k.a Video on Segmentation Fault (Regarding to the technical details of VOSF, please refer to [3]). Whether it supports VOSF or not is determined by another condition that whether OS supports segmentation fault signal handler. This relies on several compilation test in src/CrossPlatform folder controlled by macro. As long as any of test below is passed under Linux/Mac OS X, it will set variable CAN_VOSF as ‘yes’.
/*Test 1: Check if OS supports extended signal handlers via asm*/
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#define HAVE_ASM_UCONTEXT 1
#define HAVE_SIGINFO_T 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
/*Test 2: Check if OS supports extended signal handlers*/
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#define HAVE_SIGINFO_T 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
/*Test 3: Check if OS supports the hack*/
#define HAVE_SIGCONTEXT_SUBTERFUGE 1
#define CONFIGURE_TEST_SIGSEGV_RECOVERY
#include "../CrossPlatform/vm_alloc.cpp"
#include "../CrossPlatform/sigsegv.cpp"
- Check if the OS allows page zero mapping in user space or another related page zero hack. Due to NULL pointer deference security concern, by default the modern Linux kernel has disabled page zero mapping. But Mac OS X still supports this.
In the end, the configure script uses the result of the tests:
- If native M68k CPU, it uses real addressing.
- If page zero mapping is supported and the host is able to do VOSF, it uses real addressing.
- If page zero mapping is not supported, but it is able to do VOSF, it uses direct addressing.
- Otherwise, it uses memory banks a.k.a virtual addressing.
See parts of configure.ac that determines addressing mode.
Common data structure defined in src/uae_cpu/cpu_emulation.h
// RAM and ROM pointers
uint32 RAMBaseMac = 0; // RAM base (Mac address space) gb-- initializer is important
uint8 *RAMBaseHost; // RAM base (host address space)
uint32 RAMSize; // Size of RAM
uint32 ROMBaseMac; // ROM base (Mac address space)
uint8 *ROMBaseHost; // ROM base (host address space)
uint32 ROMSize; // Size of ROM
Common utilities functions defined in src/uae_cpu/cpu_emulation.h
static inline uint32 ReadMacInt32(uint32 addr) {return get_long(addr);}
static inline uint32 ReadMacInt16(uint32 addr) {return get_word(addr);}
static inline uint32 ReadMacInt8(uint32 addr) {return get_byte(addr);}
static inline void WriteMacInt32(uint32 addr, uint32 l) {put_long(addr, l);}
static inline void WriteMacInt16(uint32 addr, uint32 w) {put_word(addr, w);}
static inline void WriteMacInt8(uint32 addr, uint32 b) {put_byte(addr, b);}
static inline uint8 *Mac2HostAddr(uint32 addr) {return get_real_address(addr);}
static inline uint32 Host2MacAddr(uint8 *addr) {return get_virtual_address(addr);}
static inline void *Mac_memset(uint32 addr, int c, size_t n) {return memset(Mac2HostAddr(addr), c, n);}
static inline void *Mac2Host_memcpy(void *dest, uint32 src, size_t n) {return memcpy(dest, Mac2HostAddr(src), n);}
static inline void *Host2Mac_memcpy(uint32 dest, const void *src, size_t n) {return memcpy(Mac2HostAddr(dest), src, n);}
static inline void *Mac2Mac_memcpy(uint32 dest, uint32 src, size_t n) {return memcpy(Mac2HostAddr(dest), Mac2HostAddr(src), n);}
This allocates memory in host OS for guest OS's RAM and ROM.
TODO -- Change the following into sequence diagram later.
vm_acuqire_mac(RAMSize + 0x100000) -- why ROM size only 1MB 0x100000?
=>
vm_acquire(size, VM_MAP_DEFAULT | VM_MAP_32BIT) -- from CrossPlatform VM allocator wrapper.
=>
vm_acquire calls on OS specific user space memory allocation function and setup read/write flag.
For example, in Linux vm_acquire function calls mmap [4] to map /dev/zero file to the starting address 0x10000000 (NOTE: virtual address of host OS starts at 256MB) with specific size. This creates chunks of pre-initialized memory in host OS. Then vm_acquire calls mprotect to specify allocated virtual memory with with read/write permission.
The mapping of direct addressing between host and guest is simple and efficient.
Host: RAM starts at whatever return address from vm_acquire_mac function. Let’s denote it as RAMBaseHost. ROM starts at RAMBaseHost + RAMSize.
Guest: RAM starts at 0. ROM starts at RAMSIZE.
As you can easily tell, the mapping between host and guest is to apply a fixed difference -- RAMBaseHost. In my test, I found that Linux 64 and Linux ARMv7 uses direct addressing. You can run pmap command on BasiliskII process to confirm code analysis with actual memory usage at runtime. In my BII’s preference, 1GB RAM is specified. According to pmap, 1GB memory is allocated with read/write flag starting at virtual address 0x10000000.
[Ricky@gtx Unix]$ cat ~/.basilisk_ii_prefs | grep ramsize
ramsize 1073741824
[Ricky@gtx Unix]$ ps -A | grep BasiliskII
27151 pts/1 00:00:01 BasiliskII
[Ricky@gtx Unix]$ pmap 27151
27151: ./BasiliskII
0000000010000000 1048640K rw--- [ anon ]
0000000050010000 1896K r---- [ anon ]
0000000078000000 2160K rwx-- BasiliskII
000000007821c000 544K rwx-- [ anon ]
000000007a1ef000 2920K rw--- [ anon ]
Real addressing requires host CPU to access host memory from 0x0000 to 0x2000. Basilisk II, as a user space application, needs to manually relocate its text segment and all other data segment properly so that they can avoid conflict with pre-allocated guest OS memory. The trick is done by linker script [5] defined in src/Unix/ldscripts folder. See -T
option in linking.
...
g++ -o BasiliskII -Wl,-T,ldscripts/linux-x86_64.ld obj/main.o obj/prefs.o obj/prefs_items.o obj/sys_unix.o obj/rom_patches.o …
...
There are very few host OS and architecture can run in real addressing. But at that time, real addressing was by far the fastest addressing mapping scheme in emulation. That's why the original programmers tried very hard to overcome page zero problem. We will explore this later in modern AMD64 Linux host.
The following allocates memory in host OS for guest OS's RAM and ROM in contiguous location:
TODO -- Change the following into sequence diagram later.
vm_acquire_mac_fixed(0, RAMSize + 0x100000)
=>
vm_acquire_fixed(addr, size, VM_MAP_DEFAULT | VM_MAP_32BIT)
=>
vm_acquire_fixed calls on OS specific user space memory allocation function and setup read/write flag.
Compared to direct addressing, real addressing is very straightforward with zero overhead in terms of address translation. The address in guest CPU maps to the same address in the host CPU. There is no address mapping is needed.
Native M68k CPU can use real addressing. But just for fun, let’s do an experiment to trick Basilisk II into real addressing mode under modern Linux host.
First, we explicitly set vm.mmap_min_addr
to 0 by sysctl in Linux so that we can use paging zero in user space.
[root@gtx vm]# echo 0 > /proc/sys/vm/mmap_min_addr
Secondly, run configure
./configure --enable-sdl-video --enable-sdl-audio --disable-jit-compiler --with-x --with-gtk --enable-addressing=real
However, it shows that the addressing uses memory_banks, instead. We already know that Linux can use VOSF. So it must be test that set $ac_cv_can_map_lm
variable failed. I extract the test program from configure.ac
Run the following to compile and execute the test program. I got an illegal instruction error, instead of segfault. Here is test program source code and below is compilation and test result:
g++ -o conftest -g -O2 -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT conftest.cpp -lm -lrt -lrt -lSDL -lpthread
./conftest
Illegal instruction (core dumped)
Here is the more interesting finding. I run gdb and found that it failed at lm[0] = ‘z’
. After disassembling the binary, I found that it is GCC optimization's fault.
.text:00000000004006C0 call _Z16vm_acquire_fixedPvmi ; vm_acquire_fixed(void *,ulong,int)
.text:00000000004006C5 test eax, eax
.text:00000000004006C7 js short loc_4006D3
.text:00000000004006C9 mov byte ptr ds:0, 0
.text:00000000004006D1 ud2
GCC thought that since lm
is a NULL pointer, there is no point to do assignment. That’s why you see ud2
undefined instruction, which raise an invalid opcode exception.
If you remove -O2
optimization flag in compilation, you will get a successful run.
g++ -o conftest -g -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT conftest.cpp -lm -lrt -lrt -lSDL -lpthread
./conftest
Here is the assembly from non-optimization binary:
.text:0000000000400A8F loc_400A8F: ; CODE XREF: main+1Aj
.text:0000000000400A8F mov edx, 2
.text:0000000000400A94 mov esi, 2000h
.text:0000000000400A99 mov edi, 0
.text:0000000000400A9E call _Z16vm_acquire_fixedPvmi ; vm_acquire_fixed(void *,ulong,int)
.text:0000000000400AA3 shr eax, 1Fh
.text:0000000000400AA6 test al, al
.text:0000000000400AA8 jz short loc_400AB4
.text:0000000000400AAA mov edi, 1 ; status
.text:0000000000400AAF call _exit
.text:0000000000400AB4 ; ---------------------------------------------------------------------------
.text:0000000000400AB4
.text:0000000000400AB4 loc_400AB4: ; CODE XREF: main+3Fj
.text:0000000000400AB4 mov rax, [rbp+var_8]
.text:0000000000400AB8 mov byte ptr [rax], 7Ah ; Ricky comment - 7Ah is ‘z’ ASCII
.text:0000000000400ABB mov rax, [rbp+var_8]
Therefore, to workaround this you need to add compile flag as -O0
to disable GCC optimization.
CFLAGS=-O0;CPPFLAGS=-O0 ./configure --enable-sdl-video --enable-sdl-audio --disable-jit-compiler --with-x --with-gtk --enable-addressing=real
…
Assembly optimizations ................. : x86-64
Addressing mode ........................ : real
Bad memory access recovery type ........ : siginfo
...
Build and run BasiliskII in real mode
[Ricky@gtx zero_mem]$ ps -A | grep BasiliskII
30607 pts/2 00:00:00 BasiliskII
[Ricky@gtx zero_mem]$ pmap 30607
30607: ./BasiliskII
0000000000000000 1048576K rw--- [ anon ]
0000000040425000 64K rw--- [ anon ]
0000000040435000 1896K r---- [ anon ]
Finally, BasiliskII can run in real addressing mode with a workaround.
Host memory pre-allocation in virtual addressing is the same as the one in direct addressing. The difference lies in initialization of memory banks, which are unique data structure in virtual addressing.
TODO -- Change the following into sequence diagram later.
InitAll(vmdir) -- from src/main.cpp
=>
Init680x0() -- from src/uae_cpu/basilisk_glue.cpp. Specify RAMBaseMac=0, ROMBaseMac address based on ROM type.
=>
memory_init(void) -- from src/uae_cpu/memory.cpp. It initializes mem_banks array and compute diff. See details below.
Inside memory_init
function, it computes the address differences between host and guest in RAM, ROM and framebuffer.
In addition, it initializes mem_banks array, which are an array of struct that contains a group of memory read/write function pointers.
We will revisit mem-banks later in next section.
The mapping mechanism of virtual addressing a.k.a memory banks is the most complicated one among all three addressing modes.
Let’s first review some basic facts in host and guest OS.
Host: like direct addressing, RAM starts at whatever return address from vm_acquire_mac function. Let’s denote it as RAMBaseHost. ROM starts at RAMBaseHost + RAMSize.
Guest: RAM RAMBaseMac starts at 0. ROM ROMBaseMacstarts at the address based on ROM type.
Now we come back to the discussion of mem_banks array data structure.
In M68k CPU, 68020 and above support 32 bit addressing, while 68010 and below supports 24 bit addressing. In Basilisk II implementation, virtual addressing divides address space into equal sized banks. The size of each bank is 2^16 = 64Kbyte.
In 32 bit addressing, the upper 32 bit to 17 bit of address, i.e the virtual address left shift 16, is used as bank index. The maximum number of banks are also 2^16 = 65536. Each bank can be used as RAM, ROM and framebuffer. For each bank, Basilisk II assign different read/write policy during host memory initialization. For example, ROM is read only, the mapping of RAM and ROM between host and guest is based on different memory address difference. See details in src/uae_cpu/memory.h
.
Host memory initialization is done in memory_init
function. First it initializes the whole mem_banks array with a default struct of dummy memory accessing function pointers. Then based on RAM, frame buffer and ROM and also the condition whether it supports 24 bit or 32 bit addressing, it fills mem_banks array with corresponding memory access logic.
Basilisk II virtual addressing is NOT a software PMMU emulation [6], although it looks similar to one level memory paging. Virtual addressing is just the way Basilisk II manage mapping between guest and host. I doubt that you can enable virtual memory options in System 7 control panel without problem. TODO Test this in Basilisk II. I have a hunch that Basilisk II emulation core doesn’t translate 68030 PMMU related instructions, such as PTEST
, PLOAD
, PFLUSH
and PFLUSHA
. TODO come back to this later when reading BII CPU instruction emulation code.
By default, Intel modern Mac OS X uses virtual addressing. But you can also force Linux use virtual addressing in configure as well. As you can see, virtual addressing comes with additional overhead. But I can hardly tell the performance differences under Intel Core i7‑2700K 3.5 GHz CPU.
TODO
TODO